Phil Long
Google
plong@google.com



Selected Publications

P. M. Long and P. L. Bartlett. Sharpness-Aware Minimization and the edge of stability. JMLR, to appear. Code.

P. L. Bartlett, P. M. Long and O. Bousquet. The dynamics of Sharpness-Aware Minimization: bouncing across ravines and drifting towards wide minima. JMLR, 24(316):1-36, 2023.

N. Chatterji and P. M. Long. Deep linear networks can benignly overfit when shallow ones do. JMLR, 24(117):1-39, 2023.

P. M. Long and R. A. Servedio. The perils of being unhinged: On the accuracy of classifiers minimizing a noise-robust convex loss. Neural Computation, 34(6):1488-1499, 2022.

N. Chatterji and P. M. Long. Foolish crowds support benign overfitting. JMLR, 23(125):1-12, 2022.

P. M. Long. Superlinear integrality gaps for the minimum majority problem. SIDMA, 35(4):3004-3016, 2021.

N. Chatterji, P. M. Long and P. L. Bartlett. The interplay between implicit bias and benign overfitting in two-layer linear networks. JMLR, 23(263):1-48, 2022.

N. Chatterji, P. M. Long and P. L. Bartlett. When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations? COLT'21.

N. Chatterji, P. M. Long and P. L. Bartlett. When does gradient descent with logistic loss find interpolating two-layer networks? JMLR, 22(159):1-48, 2021.

P. L. Bartlett and P. M. Long. Failures of model-dependent generalization bounds for least-norm interpolation. JMLR, 22(204):1-15, 2021.

N. Chatterji and P. M. Long. Finite-sample analysis of interpolating linear classifiers in the overparameterized regime. JMLR, 22(129):1-30, 2021.

D. Zou, P. M. Long and Q. Gu. On the global convergence of training deep linear resnets. ICLR'20.

N. Chatterji, P. L. Bartlett and P. M. Long. Oracle lower bounds for stochastic gradient sampling algorithms. Bernoulli, 28(2):1074-1092, 2022.

P. M. Long and R. J. Long. On the complexity of proper distribution-free learning of linear classifiers. ALT'20.

P. L. Bartlett, P. M. Long, G. Lugosi, and A. Tsigler. Benign overfitting in linear regression. Proceedings of the National Academy of Sciences, 117(48):30063-30070, 2020.

P. M. Long and H. Sedghi. Generalization bounds for deep convolutional neural networks. ICLR'20.

P. M. Long and H. Sedghi. On the effect of the activation function on the distribution of hidden nodes in a deep network. Neural Computation, 31(12): 2562-2580, 2019.

A. De, P. M. Long and R. A. Servedio. Density estimation for shift-invariant multidimensional distributions. ITCS'19.

A. De, P. M. Long and R. A. Servedio. Learning sums of independent random variables with sparse collective support. JMLR, 21(221):1-79, 2020.

H. Sedghi, V. Gupta, and P. M. Long. The singular values of convolutional layers. ICLR'19.

P. L. Bartlett, D. P. Helmbold and P. M. Long. Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks. Neural Computation, 31: 477-502, 2019.

P. M. Long. New bounds on the price of bandit feedback for mistake-bounded online multiclass learning. Theoretical Computer Science, 808: 159-163, 2020. (Correction by J. Geneson.)

D. P. Helmbold and P. M. Long. Surprising properties of dropout in deep networks. JMLR, 18(200):1-28, 2018.

D. Legrand, et al. Visual product discovery. KDD Workshop: Machine Learning Meets Fashion, 2016.

D. P. Helmbold and P. M. Long. On the inductive bias of dropout. JMLR, 16:3403-3454, 2015.

P. Awasthi, M. F. Balcan and P. M. Long. The power of localization for efficiently learning linear separators with noise. JACM, 63(6): 50:1-50:27, 2017.

P. M. Long and R. A. Servedio. Consistency versus realizable H-consistency for multiclass classification. ICML'13.

M. F. Balcan and P. M. Long. Active and passive learning of linear separators under log-concave distributions. COLT'13.

P. M. Long and R. A. Servedio. On the weight of halfspaces over Hamming balls. SIAM Journal on Discrete Mathematics, 28(3): 1035-1061, 2014.

D. P. Helmbold and P. M. Long. New bounds for learning intervals with implications for semi-supervised learning. COLT'12.

P. M. Long and R. A. Servedio. Algorithms and hardness results for parallel large margin learning. JMLR, 14:3073-3096, 2013.

P. M. Long and R. A. Servedio. Learning large-margin halfspaces with more malicious noise. NIPS'11.

D. P. Helmbold and P. M. Long. On the necessity of irrelevant variables. JMLR, 13:2145-2170, 2012.

N. H. Bshouty and P. M. Long. Finding planted partitions in nearly linear time using arrested spectral clustering. ICML'10.

P. M. Long and R. A. Servedio. Restricted Boltzmann Machines are hard to approximately evaluate or simulate. ICML'10.

N. H. Bshouty and P. M. Long. Linear classifiers are nearly optimal when hidden variables have diverse effects. Machine Learning, 86(2): 209-231, 2012.

A. R. Klivans, P. M. Long and R. A. Servedio. Learning halfspaces with malicious noise. JMLR, 10: 2715-2740, 2009.

A. R. Klivans, P. M. Long and A. K. Tang. Baum's algorithm learns intersections of halfspaces with respect to log-concave distributions. RANDOM'09.

P. M. Long and R. A. Servedio. Random classification noise defeats all convex potential boosters. Machine Learning, 78(3): 287-304, 2010.

P. M. Long and R. A. Servedio. Adaptive martingale boosting. NIPS'08.

P. M. Long and R. A. Servedio. Boosting the area under the ROC curve. NIPS'07.

Z. Barutcuoglu, P. M. Long and R. A. Servedio. One-pass boosting. NIPS'07.

P. M. Long, R. A. Servedio and H. U. Simon. Discriminative learning can succeed where generative learning fails. Information Processing Letters, 103(4):131-135, 2007.

N. H. Bshouty, Y. Li and P. M. Long. Using the doubling dimension to analyze the generalization of learning algorithms. JCSS, 75(6):323-335, 2009.

P. M. Long and R. A. Servedio. Attribute-efficient learning of decision lists and linear threshold functions under unconcentrated distributions. NIPS'06.

O. Dekel, P. M. Long and Y. Singer. Online learning of multiple tasks with a shared loss. JMLR, 8:2233-2264, 2007.

P. Gross, A. Boulanger, M. Arias, et al. Predicting electricity distribution feeder failures using machine learning susceptibility analysis. IAAI'06.

P. M. Long, V. Varadan, S. Gilman, M. Treshock and R. A. Servedio.Unsupervised evidence integration. ICML'05. (Supplementary material, including an error and a fix.)

P. M. Long and R. A. Servedio. Martingale boosting. COLT'05.

S. W. Nam, et al. Molecular changes from dysplastic nodule to hepatocellular carcinoma through gene expression profiling. Hepatology, 42(4): 809-818, 2005.

P. M. Long and X. Wu. Mistake bounds for maximum entropy discrimination. NIPS'04.

V. B. Vega, et al. Mutational dynamics of the SARS coronavirus in cell culture and human populations isolated in 2003. BMC Infectious Diseases, 4:32, 2004.

S. Y. Neo, et al. Identification of discriminators of hepatocellular carcinoma by gene expression profiling using a minimal dataset approach. Hepatology, 39(4): 944-953, 2004.

N. O. Fortune, H. H. Otu, H. H. Ng, et al. Comment on "Stemness: Transcriptional Profiling of Embryonic and Adult Stem Cells" and "A Stem Cell Molecular Signature". Science, 302(5644): 393, 2003.

S. Dasgupta and P. M. Long. Boosting with diverse base classifiers. Proceedings of the 2003 Conference on Learning Theory.

S. Dasgupta and P. M. Long. Performance guarantees for hierarchical clustering. Journal of Computer and System Sciences, 70(4):555-569, 2005.

C. Sotiriou, et al. Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proceedings of the National Academy of Sciences, 100(18): 10393-10398, 2003.

Y. Ruan, C. L. Wei, et al. Comparative full-length genome sequence analysis of 14 SARS coronavirus isolates and common mutations associated with putative origins of infection. Lancet, 361: 1779-85, 2003.

P. M. Long. An upper bound on the sample complexity of PAC learning halfspaces with respect to the uniform distribution. Information Processing Letters, 87(5): 229-234, 2003.

P. M. Long and V. B. Vega. Boosting and microarray data. Machine Learning, 52(1):31-44, 2003.

P. M. Long. Minimum majority classification and boosting. Proceedings of the The Eighteenth National Conference on Artificial Intelligence, 2002.

P. M. Long. Using the pseudo-dimension to analyze approximation algorithms for integer programming. Proceedings of the Seventh International Workshop on Algorithms and Data Structures, 2001. (Correction.)

P. M. Long. On agnostic learning with {0,*,1}-valued and real-valued hypotheses. Proceedings of the 2001 Conference on Computational Learning Theory.

S. Dasgupta, W. S. Lee and P. M. Long. A theoretical analysis of query selection for collaborative filtering. Machine Learning, 51: 283-298, 2003.

S. Ben-David, P. M. Long and Y. Mansour. Agnostic boosting. Proceedings of the 2001 Conference on Computational Learning Theory.

S. Ben-David, N. Eiron and P. M. Long. On the difficulty of approximately maximizing agreements. Journal of Computer and System Sciences, 66(3):496-514, 2003.

Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Machine Learning, 46 (1/3):361-387, 2002.

Y. Li, P. M. Long and A. Srinivasan. Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3):516-527, 2001.

Y. Li, P. M. Long and A. Srinivasan. The one-inclusion graph algorithm is near-optimal for the prediction model of learning. IEEE Transactions on Information Theory, 47(3):1257-1261, 2001.

P. M. Long. The complexity of learning according to two models of a drifting environment. Machine Learning, 37(3):337-354, 1999.

P. M. Long. Efficient algorithms for learning functions with bounded variation. Information and Computation, 188(1):99-115, 2004.

N. Abe, A. W. Biermann, and P. M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4): 263-293, 2003.

P. Auer, P. M. Long and A. Srinivasan. Approximating hyper-rectangles: learning and pseudorandom sets. Journal of Computer and System Sciences, 57(3):376-388, 1998.

P. M. Long, A. I. Natsev and J. S. Vitter. Text compression via alphabet re-representation. Neural Networks, 12(4-5):755-776, 1999.

P. M. Long. Improved bounds about on-line learning of smooth functions of a single variable. Theoretical Computer Science, 241(1-2):25-35, 2000.

D. T. Hoang, P. M. Long and J. S. Vitter. Dictionary selection using partial matching. Information Sciences, 119(1-2):57-72, 1999.

D. T. Hoang, P. M. Long and J. S. Vitter. Efficient cost measures for motion compensation at low bit rates. IEEE Transactions on Circuits and Systems for Video Technology, 8(4):488-500, 1998.

P. M. Long and L. Tan. PAC learning axis-aligned rectangles with respect to product distributions from multiple-instance examples. Machine Learning, 30(1):7-22, 1998.

R. D. Barve and P. M. Long. On the complexity of learning from drifting distributions. Information and Computation, 138(2):101-123, 1997.

P. L. Bartlett and P. M. Long. Prediction, learning, uniform convergence, and scale-sensitive dimensions. Journal of Computer and System Sciences, 56(2):174-190, 1998.

P. Krishnan, P. M. Long and J. S. Vitter. Adaptive disk spindown via optimal rent-to-buy in probabilistic environments. Algorithmica, 23(1):31-56, 1999.

P. L. Bartlett, P. M. Long and R. C. Williamson. Fat shattering and the learnability of real-valued functions. Journal of Computer and System Sciences, 52(3):434-452, 1996.

P. Auer and P. M. Long. Structural results for online learning models with and without queries. Machine Learning, 36(3):147-181, 1999.

P. M. Long. On the sample complexity of PAC learning halfspaces against the uniform distribution. IEEE Transactions on Neural Networks, 6(6):1556-1559, 1995.

P. Auer, P. M. Long, W. Maass and G. J. Woeginger. On the complexity of function learning. Machine Learning, 18(2):187-236, 1995.

P. M. Long. Halfspace learning, linear programming, and nonmalicious distributions. Information Processing Letters, 51:245-250, 1994.

N. Cesa-Bianchi, P. M. Long and M. K. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.

D. P. Helmbold, N. Littlestone and P. M. Long. Online learning with linear loss constraints. Information and Computation, 161(2):140-171, 2000.

D. P. Helmbold, N. Littlestone and P. M. Long. Apple tasting. Information and Computation, 161(2):85-139, 2000.

S. Ben-David, N. Cesa-Bianchi, D. Haussler and P. M. Long. Characterizations of the learnability of {0,..,n}-valued functions. Journal of Computer and System Sciences, 50(1):74-86, 1995.

D. Kimber and P. M. Long. On-line learning of smooth functions of a single variable. Theoretical Computer Science, 148(1):141-156, 1995.

D. P. Helmbold and P. M. Long. Tracking drifting concepts by minimizing disagreements. Machine Learning, 14(1):27-46, 1994.

N. Littlestone, P. M. Long and M. K. Warmuth. On-line learning of linear functions. Computational Complexity, 5:1-23, 1995.

D. Haussler and P. M. Long. A generalization of Sauer's Lemma. Journal of Combinatorial Theory, Series A, 71(2):219-240, 1995.

P. M. Long and M. K. Warmuth. Composite geometric concepts and polynomial predictability. Information and Computation, 113(2):203-252, 1994.




STOC'13 Workshop on New (Theoretical) Challenges in Machine Learning



Music

Twenty

Wading

Still Alive, by Jonathan Coulton

Black Hole Sun, by Chris Cornell

Black Hole Sun (again)

Bach Invention #13

Fugue in Classic Style, by Miaskovsky

Debussy Prelude #1 for Piano

Debussy Prelude #2 for Piano