About: To reduce the training time of large-scale Deep Neural Networks (DNNs), Deep Learning (DL) scientists have started to explore parallelization strategies like data-parallelism, model-parallelism, and hybrid-parallelism. While data-parallelism has been extensively studied and developed, several problems exist in realizing model-parallelism and hybrid-parallelism efficiently. Four major problems we focus on are: 1) defining a notion of a distributed model across processes, 2) implementing forward/back-propagation across process boundaries that requires explicit communication, 3) obtaining parallel speedup on an inherently sequential task, and 4) achieving scalability without losing out on a model’s accuracy. To address these problems, we create HyPar-Flow—a model-size and model-type agnostic, scalable, practical, and user-transparent system for hybrid-parallel training by exploiting MPI, Keras, and TensorFlow. HyPar-Flow provides a single API that can be used to perform data, model, and hybrid parallel training of any Keras model at scale. We create an internal distributed representation of the user-provided Keras model, utilize TF’s Eager execution features for distributed forward/back-propagation across processes, exploit pipelining to improve performance and leverage efficient MPI primitives for scalable communication. Between model partitions, we use send and recv to exchange layer-data/partial-errors while allreduce is used to accumulate/average gradients across model replicas. Beyond the design and implementation of HyPar-Flow, we also provide comprehensive correctness and performance results on three state-of-the-art HPC systems including TACC Frontera (#5 on Top500.org). For ResNet-1001, an ultra-deep model, HyPar-Flow provides: 1) Up to 1.6[Formula: see text] speedup over Horovod-based data-parallel training, 2) 110[Formula: see text] speedup over single-node on 128 Stampede2 nodes, and 3) 481[Formula: see text] speedup over single-node on 512 Frontera nodes.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: To reduce the training time of large-scale Deep Neural Networks (DNNs), Deep Learning (DL) scientists have started to explore parallelization strategies like data-parallelism, model-parallelism, and hybrid-parallelism. While data-parallelism has been extensively studied and developed, several problems exist in realizing model-parallelism and hybrid-parallelism efficiently. Four major problems we focus on are: 1) defining a notion of a distributed model across processes, 2) implementing forward/back-propagation across process boundaries that requires explicit communication, 3) obtaining parallel speedup on an inherently sequential task, and 4) achieving scalability without losing out on a model’s accuracy. To address these problems, we create HyPar-Flow—a model-size and model-type agnostic, scalable, practical, and user-transparent system for hybrid-parallel training by exploiting MPI, Keras, and TensorFlow. HyPar-Flow provides a single API that can be used to perform data, model, and hybrid parallel training of any Keras model at scale. We create an internal distributed representation of the user-provided Keras model, utilize TF’s Eager execution features for distributed forward/back-propagation across processes, exploit pipelining to improve performance and leverage efficient MPI primitives for scalable communication. Between model partitions, we use send and recv to exchange layer-data/partial-errors while allreduce is used to accumulate/average gradients across model replicas. Beyond the design and implementation of HyPar-Flow, we also provide comprehensive correctness and performance results on three state-of-the-art HPC systems including TACC Frontera (#5 on Top500.org). For ResNet-1001, an ultra-deep model, HyPar-Flow provides: 1) Up to 1.6[Formula: see text] speedup over Horovod-based data-parallel training, 2) 110[Formula: see text] speedup over single-node on 128 Stampede2 nodes, and 3) 481[Formula: see text] speedup over single-node on 512 Frontera nodes. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : wasabi.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	To reduce the training time of large-scale Deep Neural Networks (DNNs), Deep Learning (DL) scientists have started to explore parallelization strategies like data-parallelism, model-parallelism, and hybrid-parallelism. While data-parallelism has been extensively studied and developed, several problems exist in realizing model-parallelism and hybrid-parallelism efficiently. Four major problems we focus on are: 1) defining a notion of a distributed model across processes, 2) implementing forward/back-propagation across process boundaries that requires explicit communication, 3) obtaining parallel speedup on an inherently sequential task, and 4) achieving scalability without losing out on a model’s accuracy. To address these problems, we create HyPar-Flow—a model-size and model-type agnostic, scalable, practical, and user-transparent system for hybrid-parallel training by exploiting MPI, Keras, and TensorFlow. HyPar-Flow provides a single API that can be used to perform data, model, and hybrid parallel training of any Keras model at scale. We create an internal distributed representation of the user-provided Keras model, utilize TF’s Eager execution features for distributed forward/back-propagation across processes, exploit pipelining to improve performance and leverage efficient MPI primitives for scalable communication. Between model partitions, we use send and recv to exchange layer-data/partial-errors while allreduce is used to accumulate/average gradients across model replicas. Beyond the design and implementation of HyPar-Flow, we also provide comprehensive correctness and performance results on three state-of-the-art HPC systems including TACC Frontera (#5 on Top500.org). For ResNet-1001, an ultra-deep model, HyPar-Flow provides: 1) Up to 1.6[Formula: see text] speedup over Horovod-based data-parallel training, 2) 110[Formula: see text] speedup over single-node on 128 Stampede2 nodes, and 3) 481[Formula: see text] speedup over single-node on 512 Frontera nodes.
subject	Deep learning Parallel computing Philosophy of religion Artificial neural networks Software quality Concurrent computing Distributed computing Applied machine learning
part of	HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow
is abstract of	HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow
is hasSource of	covid:ann/target/1fb05a169df84c86ca47c8fe85feb8d10ebe6f17 covid:ann/target/0075aa857887782227cdf8c8e5d2e82b62bb857f covid:ann/target/783e2e5569340d69cecf59eae3571ee216092253 covid:ann/target/4dc26f6e2903d5ee608a28d215609eafad771e9d covid:ann/target/c9e42cbce4306ab67cbeaaf7728aa83de758d507 covid:ann/target/867ee91104375d7446bdd7aa1ea4376d64d05685 covid:ann/target/f2fe913db7d24f340d409cfc563b2f2ef85ef279 covid:ann/target/47db263dee0c38e28e2d64e71d9d021e65cf93fa covid:ann/target/464e508ec59338e3c06189b11dad141ca34e246a covid:ann/target/04141c027b483b075ed18a92595de7893bd47138 covid:ann/target/5f007ad2f2848f949c4e67aa650d961c5d00dfd7 covid:ann/target/51300c7cd6a41011b51220678d0f5a65e9d41fe3 covid:ann/target/9415795919a2f8bab73f361fe83ecdc481f5475c covid:ann/target/d63ac7fd7634263188bc5009432caadf401b284b covid:ann/target/34782881a74695d99ca2ce3df39472a9fba55338 covid:ann/target/48fd55fecc29cffe9aafd8324ff81a9c493034fa covid:ann/target/9a74b7ac8c0990b7ebf54c73230fb48c7c1ec380 covid:ann/target/b1dd00cb5aaf8df3cef71759a433e99462df66d9 covid:ann/target/e529ad316e49b5d2b99e58b9c53d58a21a16fbb4 covid:ann/target/47a73201d8df4664d85c5f180caa07541f983e87 covid:ann/target/0fb00c60ee635215980b224595177fea6cc05685 covid:ann/target/59e0075a5b45cab450164b4be17bac8ce3cf89ab covid:ann/target/fd49ba1aa1cd013e38f6c1ddc3211e2ce6292ee2 covid:ann/target/bf73ad2911e72f23035d2d33a8a9e87531370061 covid:ann/target/b8d345da14dac0ba2c8cc12c79540cfad06ac09f covid:ann/target/2811ec3cc27590c27854163f4ed8c49ad02e9d4e covid:ann/target/4d31d0410e535a7985895db0cbe6355b61d53b0f covid:ann/target/e0f76a55c0a7c3cc219413f5130c74242fdc070a covid:ann/target/853aacf0b4c1660efa17571e89de9381a30e3272 covid:ann/target/def0b4e80c6e11c73bb3f653a5b6e0eda7028a5f covid:ann/target/8582d56534c27ed7103dba5ca5a35130b4a9ce22 covid:ann/target/b75a63f18bfa61e86aa84d7b16daebdf0e6ba4ad covid:ann/target/090ea6d4082d0a8a22ba9545cdabe96c132fa08d covid:ann/target/e3508ea16036c15c9145ba138f8ae8b2281433c5 covid:ann/target/b485b4f1cbe6fb5b5affe4f54eb916781137ef67 covid:ann/target/aa3137efa9a8b4c798ce046f6f17ffead442e7ed covid:ann/target/a0c385490ca3465d48f0db0701def7bb107fb16c covid:ann/target/60dbd5f4ad9e033ba8a1899b7f94ffc3fb3ef633 covid:ann/target/1dcf3671ee116d0b3e0ca0aab446d7dfb4656171 covid:ann/target/0e378ae5c747302e5d569400ed6132c53b53442c covid:ann/target/e23ad66a5a85fd59dc80a7ff58d5c714290b8518 covid:ann/target/e384e5dd93c885a5c46f82132044ff3a768cf042 covid:ann/target/0928a47c1695050617de1288af29e4f60bbff137 covid:ann/target/23f4ef4ac83ae8972a9fcbf6a6702acec7d5b1e4 covid:ann/target/3f5c328518acae2105aab8dc246a95fb4f4a3071 covid:ann/target/47d80b3915a3b1e483ec485de642cf8b691c4918 covid:ann/target/8ef6a52774407e81bd51603b44e99e7d25bbab63 covid:ann/target/ebef182dddcc8a2d45a47317cc4f931d327dc140 covid:ann/target/fdfb760c00f5df249f109c8f12cee0b27baffed4 covid:ann/target/393f4a8a4f933a63ae59defc4f297c28471ace44 covid:ann/target/4b0e8cefc013d63c7e27230ef7cfa1f4220ef15a covid:ann/target/0999c3f1b750b93b9faf001e4836f518c8e1fa84 covid:ann/target/f38b4badf3c082591b9830906ba1d649b535afe8 covid:ann/target/c80ac7e0595a6454a227e8459a82038104b64e43 covid:ann/target/16609a42048d12afaf497e293686965b9bf0926d covid:ann/target/b79b21d45150d99a217351c6b1e0b8e08daa98d7 covid:ann/target/ee315e32dc66073171fb03adb4a5a1a84cd1018c covid:ann/target/4a87308a8315c678a2a5c0fa260cc28e80c07532 covid:ann/target/1d60cb09f865a1f68bc45c9fcfb2d14a27d70e4d covid:ann/target/7fe22d818a98078b51e2444ee1309ee6d43d387d covid:ann/target/340f98602bd4d1c8218882b6a41a737e2ce6f316 covid:ann/target/35a5cd6cb8ef848d91db2e9b19d45f3060581ac4 covid:ann/target/c7026ecf38ec0d7cc5306ccb6dfda62e176fdc46 covid:ann/target/eb87e4b33222a9e59d9a712703e580d8c0d81f90 covid:ann/target/b9af1685ea099790323afe76a1936bf5953b77d5 covid:ann/target/82fef4af13773513df6ae842e704dbe01d06389e covid:ann/target/d561fc70a9d702c05e06ece1f24fa125431c4623 covid:ann/target/e9c6147ff082771ca80ad2e94e626ddeedfbd810 covid:ann/target/0281b30eb5e995b52f435d4ff469343cf99c24dc covid:ann/target/e3f98fbd928dd0a70b55375e7ebd84106d934aad covid:ann/target/4daaa9d81c306cfe31709c5abe0d9b9b4f0e6675 covid:ann/target/96cb5c238ffdebf337925f607b8eb26e9da2204d covid:ann/target/5f01eab78d531c3bb2d19a537f4332dcbd824bed covid:ann/target/47a6be1d8642bc868ae61d5ffb299ace74ba22ee covid:ann/target/8c77b81019d142f6aca06eddc741d2e39b605189 covid:ann/target/38ccbc1eb98e3e93dfa6be0ab3e333eaea062ede covid:ann/target/af944c71565662a2dc52c4f006e0d0232024d70b covid:ann/target/604cdc571a1697bed8187c86f1897b587e52e4d5 covid:ann/target/7b59cdf83d0161e6acc458e7012d596630c27033 covid:ann/target/2afea033c26664de510925f3fefe41516b7192bb covid:ann/target/1daeed567390ac611fa537e61424e23d51311579 covid:ann/target/ada0fe7208d1e8013743f5bba7816fc80f554bcf covid:ann/target/25151872d7f50da8e12540164ea53085782645c1 covid:ann/target/a86ca1a1dc642435fd244992527116e11496a1d4 covid:ann/target/753132f13bd5f3eafec6ce3d9aca465ab6e63e7b covid:ann/target/87ca7ee2948b73255572b8af1b26b1b7e6535b02 covid:ann/target/f90fa948ee2dd13d8479b675ff1886453910e635 covid:ann/target/fc1806ca7a89dc4aaab0cbb138e097b33f29ac50

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2025 OpenLink Software