Hello and welcome to the second edition of the „Data Talks“ segment of the Data Science student blog. Today we have the honor to interview Piotr Zwiernik, who is assistant professor at Universitat Pompeu Fabra. Professor Zwiernik was recently awarded the Beatriu de Pinós grant from the Catalan Agency for Management of University and Research Grants. In the Data Science Master’s Program he teaches the maths brush-up and the convex optimization part of the first term class „Deterministic Models and Optimization“. Furthermore, he is one of the leading researchers in the field of Gaussian Graphical Models and algebraic statistics. We discuss his personal path, the fascination for algebraic statistic as well as the epistemological question of low-dimensional structures in nature.
Robert: First of all thank you very much for taking the time to talk to us. So let us start with the first question – Prof. Zwiernik, how did you get to where you are?
P. Z.: The pleasure is mine. Originally, I wanted to be a journalist and decided to study Economics. During this time I worked with Mas-Collel’s (one of BGSE’s founding fathers) classic textbook on Microeconomics and realized that I wasn’t that bad in maths. I chose to also study mathematics in Poland. Afterwards, I soon came to realize that I loved all algebraic subjects and especially algebraic geometry. Since I always wanted to combine subjects, I searched for different paths to synthesize algebra and statistics.
In 1998 the „magician“, Persi Diaconis, and Bernd Sturmfels, one of my later co-authors, published a beautiful paper, which introduced an efficient way of sampling from conditional distributions that generate specific sufficient statistics. Computational algebra provides a set of moves that allows us to connect the sampling space. Afterwards, one just has to run a standard MCMC algorithm such as Metropolis-Hastings. This beauty led to me focusing on the geometry of graphical models during my PhD studies in statistics at the University of Warwick.
At UC Berkely, during one of my many post-docs I was actually able to meet Professor Diaconis in person and had opportunity to participate in Michael Jordan’s working group. In Jordan’s group we read many papers, which covered completely different topics. But still we were able to find deep connections. This helped me realize, what people were really interested in. Nowadays, I tackle problems, which lie in the intersection of algebraic geometry and high-dimensional statistics.
Robert: How did your Economics background affect the path that you took and how does it influence your way of thinking?
P. Z.: First of all Economics introduced me to the field of Econometrics and statistics in general. Doing research in Econometrics also showed me the difference between theory and working with real data. We should never forget: Working with data is hard. Really hard.
One of my personal heroes is Terry Speed, who went through a very interesting transformation. He started of by studying pure geometry. Afterwards he got interested in statistics and cumulants. Only in the end he decided to work with real data and contributed research in the field of Bioinformatics. I can imagine traveling a similar path.
Robert: Can you tell us more about your post-doc time in the US? How did it compare to your experiences in Europe and did you feel pressured to succeed in the academic world?
P. Z.: In my mind the time of being a post-doc is the time of exploration. During mine I realized that everything is connected with each other. Hence, no time and effort is ever completely wasted. There is always something you can take away from studying a topic. It is extremely hard to do research if you constantly want to prove yourself. During the years I observed many colleagues, who burned themselves exactly because of this. The take-away message: Free exploration of the world is key and you should not stress yourself too much.
The academic world in the US is not very homogenous. Berkeley for example has a strong focus on optimization and Computer Science. Chicago and Seattle, on the other hand, specialize more on structural and algebraic problems – on how things really work.
Nandan: How do you define „making progress in understanding“?
P. Z.: This is a tough question and I will try to answer it with a recent example: Total positive distributions. One example of these can be found in so-called one-factor models, which are commonly used in psychometrics. Specific abilities (e.g. logical reasoning and spatial understanding) are assumed to be conditionally independent given a latent variable (e.g. intelligence). Usually these abilities are also positively correlated and this phenomenon seems to be characteristic for many natural systems. Sometimes it is only approximate (like Gaussianity), but still it is omnipresent. For example two undergraduate students of mine studied high-dimensional portfolio selection and were able to show that the covariance matrix of stocks exhibits strong traits of total positivity. This is only one example. Every few years it feels like I reach a new milestone of understanding.
Robert: Can you also give us a philosophical and intuitive understanding of your fascination for algebraic statistics?
P. Z.: Algebra is all about finding patterns in nature and different subfields of mathematics. The world is full of symmetrical relationships, which we try to detect. Thereby algebra simplifies and unifies the language, researchers use. Personally, I am interested in models that exhibit this symmetry property.
Gaussian Graphical Models are just one instance of this class of models. Another beautiful property of them is that there are many different ways, in which we can express them:
- They can be formulated from a statistical physics point of view. The joint distribution can be written as the product of factor functions.
- Furthermore, we can also express them with the help of classical graph theory. Nodes represent variables. Missing edges represent conditional independence relationships. Each graph represents a specific factorization of the joint distribution.
- Finally, they can also be studied as parametrized distributions such as in the exponential family setting. This allows us to make use of ideas from the theory of convexity, combinatorics and decomposable graphs.
There are endless connections, which brings me back to one of my previous points: Everything is related. Making connections and exploiting links is the way how we make progress.
Robert: In the previous year and in our convex optimization class we spoke a lot about penalized likelihood models and the underlying assumption of sparsity in nature. Do you think this assumption holds?
P. Z.: Not in the sense that the causal relationship between some variables has to be 0. But I believe in the hypothesis that nature’s underlying factors lie on a lower-dimensional manifold. Nature tends to fall in love with low-dimensional structures, which are not generic but very special. One such example is the Fibonacci Sequence. Low-dimensionality can have many different faces (e.g. sparsity or low-rank matrix factorizations). But often times people just assume the form, which is computationally tractable.
Nandan and Robert: You are also the organizer of the Statistics and Operations Research seminar at UPF. Can you tell us more about it, your future plans and the importance of a flourishing research community?
P. Z.: First of all, we have a very limited budget. This means that we are very much focused on inviting European researchers. Everyone knows Pompeu Fabra for its outstanding research in Economics. But not many people know us for our work in statistics or mathematics. Our overall goal is to change this. And our strategy is twofold: First, we invite strong senior researchers, who are great speakers. The speakers are exposed to the research group and the group can benefit from external stimuli. Second, we also invite young researchers, who work in fields close to our own. This gives them the opportunity to collaborate with us in the future and also helps us to increase our reach.
In the past we have had the fortune to host prominent guests such as Caroline Uhler (MIT), Wilfrid Kendall (Warwick) and Bin Yu (Berkeley). So we are slowly but surely progressing. For the future we want to invite more researchers. Also, the newly assembled Data Science Center is going to generate new sources of funding. We are hoping to be able to invite at least one very good speaker from the US per year.
Having a flourishing local community is extremely important! Our small group works extremely well together. For example, Eulalia Nualart, who is a theoretical probabilist, has written multiple papers with Christian Brownlees, who mainly researches in time series analysis. Mihalis Markakis and Gabor Lugosi started working on bandit problems together and in general there is a friendly and open atmosphere at UPF. All of us are doing very distinct things, so there are many opportunities to learn from each other.
Nandan and Robert: Lets wrap this up by asking you for your ultimate advice for all PhD students and aspiring Data Scientists? What do you wish you could have known earlier?
P. Z.: Don’t think about research as a way to prove yourself, but as an exploration process of the world. David Blackwell, one of the brightest statisticians ever, once said: „Basically, I’m not interested in doing research and I have never been. […] I’m interested in understanding, which is quite a different thing.“
Nandan and Robert: Thank you very much Professor Zwiernik!
References
Diaconis, Persi, and Bernd Sturmfels. “Algebraic algorithms for sampling from conditional distributions.” The Annals of statistics 26.1 (1998): 363-397.
Dudoit, Sandrine, ed. Selected Works of Terry Speed. Springer Science & Business Media, 2012.