Learning better while sending less: Communication-efficient online semi-supervised learning in client-server settings

Han Xiao, Shou De Lin, Mi Yen Yeh, Phillip B. Gibbons, Claudia Eckert

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of prioritization hints it sends back to the client. The goal is for the server to learn a good model of all the client data from the labeled and unlabeled data it receives. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. However, previous approaches are not designed for the client-server setting and do not hold the promise of reducing communication costs. We present a novel framework for solving this learning problem in an effective and communication-efficient manner. On the server side, our solution combines two diverse learners working collaboratively, yet in distinct roles, on the partially labeled data stream. A compact, online graph-based semi-supervised learner is used to predict labels for the unlabeled data arriving from the clients. Samples from this model are used as ongoing training for a linear classifier. On the client side, our solution prioritizes data based on an active-learning metric that favors instances that are close to the classifier's decision hyperplane and yet far from each other. To reduce communication, the server sends the classifier's weight-vector to the client only periodically. Experimental results on real-world data sets show that this particular combination of techniques outperforms other approaches, and in particular, often outperforms (communication expensive) approaches that send all the data to the server.

Original languageEnglish
Title of host publicationProceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
EditorsGabriella Pasi, James Kwok, Osmar Zaiane, Patrick Gallinari, Eric Gaussier, Longbing Cao
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467382731
DOIs
StatePublished - 2 Dec 2015
EventIEEE International Conference on Data Science and Advanced Analytics, DSAA 2015 - Paris, France
Duration: 19 Oct 201521 Oct 2015

Publication series

NameProceedings of the 2015 IEEE International Conference on Data Science and Advanced Analytics, DSAA 2015

Conference

ConferenceIEEE International Conference on Data Science and Advanced Analytics, DSAA 2015
Country/TerritoryFrance
CityParis
Period19/10/1521/10/15

Keywords

  • big data
  • distributed system
  • online learning
  • semi-supervised learning

Fingerprint

Dive into the research topics of 'Learning better while sending less: Communication-efficient online semi-supervised learning in client-server settings'. Together they form a unique fingerprint.

Cite this