default search action
SC 2013: Denver, CO, USA
- William Gropp, Satoshi Matsuoka:
International Conference for High Performance Computing, Networking, Storage and Analysis, SC'13, Denver, CO, USA - November 17 - 21, 2013. ACM 2013, ISBN 978-1-4503-2378-9
ACM Gordon Bell finalists
- Peter W. J. Staar, Thomas A. Maier, Michael S. Summers, Gilles Fourestey, Raffaele Solcà, Thomas C. Schulthess:
Taking a quantum leap in time to solution for simulations of high-Tc superconductors. 1:1-1:11 - Massimo Bernaschi, Mauro Bisson, Massimiliano Fatica, Simone Melchionna:
20 petaflops simulation of proteins suspensions in crowding conditions. 2:1-2:11 - Diego Rossinelli, Babak Hejazialhosseini, Panagiotis E. Hadjidoukas, Costas Bekas, Alessandro Curioni, Adam Bertsch, Scott Futral, Steffen J. Schmidt, Nikolaus A. Adams, Petros Koumoutsakos:
11 PFLOP/s simulations of cloud cavitation collapse. 3:1-3:13 - Peter A. Boyle, Michael I. Buchoff, Norman H. Christ, Taku Izubuchi, Chulwoo Jung, Thomas C. Luu, Robert D. Mawhinney, Chris Schroeder, Ron Soltz, Pavlos Vranas, Joseph Wasem:
The origin of mass. 4:1-4:10 - Michael Bussmann, Heiko Burau, Thomas E. Cowan, Alexander Debus, Axel Huebl, Guido Juckeland, Thomas Kluge, Wolfgang E. Nagel, Richard Pausch, Felix Schmitt, Ulrich Schramm, Joseph Schuchart, René Widera:
Radiative signatures of the relativistic Kelvin-Helmholtz instability. 5:1-5:12 - Salman Habib, Vitali A. Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, Katrin Heitmann:
HACC: extreme scaling and performance across diverse architectures. 6:1-6:10
Fault-tolerant computing
- Xiang Ni, Esteban Meneses, Nikhil Jain, Laxmikant V. Kalé:
ACR: automatic checkpoint/restart for soft and hard error protection. 7:1-7:12 - Thomas Ropars, Tatiana V. Martsinkevich, Amina Guermouche, André Schiper, Franck Cappello:
SPBC: leveraging the characteristics of MPI HPC applications for scalable checkpointing. 8:1-8:12 - Ke Wang, Abhishek Kulkarni, Michael Lang, Dorian C. Arnold, Ioan Raicu:
Using simulation to explore distributed key-value stores for extreme-scale system services. 9:1-9:12
GPU programming
- Michael Goldfarb, Youngjoon Jo, Milind Kulkarni:
General transformations for GPU execution of tree traversals. 10:1-10:12 - Alberto Magni, Christophe Dubach, Michael F. P. O'Boyle:
A large-scale cross-architecture evaluation of thread-coarsening. 11:1-11:11 - Nishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar:
Semi-automatic restructuring of offloadable tasks for many-core accelerators. 12:1-12:12
Load balancing
- Pai-Wei Lai, Kevin Stock, Samyam Rajbhandari, Sriram Krishnamoorthy, P. Sadayappan:
A framework for load balancing of tensor contraction expressions via dynamic task partitioning. 13:1-13:10 - Md. Kamruzzaman, Steven Swanson, Dean M. Tullsen:
Load-balanced pipeline parallelism. 14:1-14:12 - Harshitha Menon, Laxmikant V. Kalé:
A distributed dynamic load balancer for iterative applications. 15:1-15:11
MPI performance and debugging
- Tobias Hilbrich, Bronis R. de Supinski, Wolfgang E. Nagel, Joachim Protze, Christel Baier, Matthias S. Müller:
Distributed wait state tracking for runtime MPI deadlock detection. 16:1-16:12 - Nilesh Mahajan, Uday Pitambare, Arun Chauhan:
Globalizing selectively: shared-memory efficiency with address-space separation. 17:1-17:12 - Andrew Friedley, Greg Bronevetsky, Torsten Hoefler, Andrew Lumsdaine:
Hybrid MPI: efficient message passing for multi-core systems. 18:1-18:11
Memory hierarchy
- Richard M. Yoo, Christopher J. Hughes, Konrad Lai, Ravi Rajwar:
Performance evaluation of Intel® transactional synchronization extensions for high-performance computing. 19:1-19:11 - Jongsoo Park, Richard M. Yoo, Daya Shanker Khudia, Christopher J. Hughes, Daehyun Kim:
Location-aware cache management for many-core processors with deep cache hierarchy. 20:1-20:12 - Doe Hyun Yoon, Jichuan Chang, Robert S. Schreiber, Norman P. Jouppi:
Practical nonvolatile multilevel-cell phase change memory. 21:1-21:12
Memory resilience
- Vilas Sridharan, Jon Stearley, Nathan DeBardeleben, Sean Blanchard, Sudhanva Gurumurthi:
Feng shui of supercomputer memory: positional effects in DRAM and SRAM faults. 22:1-22:11 - Bharan Giridhar, Michael Cieslak, Deepankar Duggal, Ronald G. Dreslinski, Hsing Min Chen, Robert Patti, Betina Hold, Chaitali Chakrabarti, Trevor N. Mudge, David T. Blaauw:
Exploring DRAM organizations for energy-efficient and resilient exascale memories. 23:1-23:12 - Xun Jian, Henry Duwe, John Sartori, Vilas Sridharan, Rakesh Kumar:
Low-power, low-storage-overhead chipkill correct via multi-line error correction. 24:1-24:12
Optimizing numerical code
- Qian Wang, Xianyi Zhang, Yunquan Zhang, Qing Yi:
AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs. 25:1-25:12 - Wai Teng Tang, Wen Jun Tan, Rajarshi Ray, Yi Wen Wong, Weiguang Chen, Shyh-Hao Kuo, Rick Siow Mong Goh, Stephen John Turner, Weng-Fai Wong:
Accelerating sparse matrix-vector multiplication on GPUs using bit-representation-optimized schemes. 26:1-26:12 - Cindy Rubio-González, Cuong Nguyen, Hong Diep Nguyen, James Demmel, William Kahan, Koushik Sen, David H. Bailey, Costin Iancu, David Hough:
Precimonious: tuning assistant for floating-point precision. 27:1-27:12
Parallel performance tools
- Xu Liu, John M. Mellor-Crummey:
A data-centric profiler for parallel programs. 28:1-28:12 - Germán Llort, Harald Servat, Juan Gonzalez, Judit Giménez, Jesús Labarta:
On the usefulness of object tracking techniques in performance analysis. 29:1-29:11 - Sanath Jayasena, Saman P. Amarasinghe, Asanka Abeyweera, Gayashan Amarasinghe, Himeshi De Silva, Sunimal Rathnayake, Xiaoqiao Meng, Yanbin Liu:
Detection of false sharing using machine learning. 30:1-30:9
Parallel programming models and compilation
- Zhao Zhang, Daniel S. Katz, Timothy G. Armstrong, Justin M. Wozniak, Ian T. Foster:
Parallelizing the execution of sequential scripts. 31:1-31:12 - Hans Vandierendonck, Kallia Chronaki, Dimitrios S. Nikolopoulos:
Deterministic scale-free pipeline parallelism with hyperqueues. 32:1-32:12 - Uday Bondhugula:
Compiling affine loop nests for distributed-memory parallel architectures. 33:1-33:12
Performance analysis of applications at large scale
- Jongsoo Park, Ganesh Bikshandi, Karthikeyan Vaidyanathan, Ping Tak Peter Tang, Pradeep Dubey, Daehyun Kim:
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors. 34:1-34:12 - Christian Godenschwager, Florian Schornbaum, Martin Bauer, Harald Köstler, Ulrich Rüde:
A framework for hybrid parallel flow simulations with a trillion cells in complex geometries. 35:1-35:12 - Xin Yuan, Santosh Mahapatra, Wickus Nienaber, Scott Pakin, Michael Lang:
A new routing scheme for Jellyfish and its performance with HPC workloads. 36:1-36:11
Performance management of HPC systems
- Alex D. Breslow, Ananta Tiwari, Martin Schulz, Laura Carrington, Lingjia Tang, Jason Mars:
Enabling fair pricing on HPC systems with node sharing. 37:1-37:12 - Mingliang Liu, Ye Jin, Jidong Zhai, Yan Zhai, Qianqian Shi, Xiaosong Ma, Wenguang Chen:
ACIC: automatic cloud I/O configurator for HPC applications. 38:1-38:12 - Shaolei Ren, Yuxiong He:
COCA: online distributed resource management for cost minimization and carbon neutrality in data centers. 39:1-39:12
System-wide application performance assessments
- Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramírez, Mateo Valero:
Supercomputing with commodity CPUs: are mobile SoCs ready for HPC? 40:1-40:12 - Abhinav Bhatele, Kathryn M. Mohror, Steve H. Langer, Katherine E. Isaacs:
There goes the neighborhood: performance degradation due to nearby jobs. 41:1-41:12 - Xiaobing Li, Yandong Wang, Yizheng Jiao, Cong Xu, Weikuan Yu:
CooMR: cross-task coordination for efficient data management in MapReduce programs. 42:1-42:11
Tools for scalable analysis
- Milind Chabbi, Karthik Murthy, Michael W. Fagan, John M. Mellor-Crummey:
Effective sampling-driven performance tools for GPU-accelerated supercomputers. 43:1-43:12 - Dong Li, Zizhong Chen, Panruo Wu, Jeffrey S. Vetter:
Rethinking algorithm-based fault tolerance with a cooperative software-hardware approach. 44:1-44:12 - Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf:
Using automated performance modeling to find scalability bugs in complex codes. 45:1-45:12
Data management in the cloud
- Kisung Lee, Ling Liu:
Efficient data partitioning model for heterogeneous graphs in the cloud. 46:1-46:12 - Yu Su, Yi Wang, Gagan Agrawal, Rajkumar Kettimuthu:
SDQuery DSI: integrating data management support with a wide area data transfer protocol. 47:1-47:12 - Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas G. Robertazzi:
Design and performance evaluation of NUMA-aware RDMA-based end-to-end data transfer systems. 48:1-48:10
Graph partitioning and data clustering
- Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok N. Choudhary:
Scalable parallel OPTICS data clustering using graph algorithmic techniques. 49:1-49:12 - Erik G. Boman, Karen D. Devine, Sivasankaran Rajamanickam:
Scalable matrix computations on large scale-free graphs using 2D graph partitioning. 50:1-50:12 - Shad Kirmani, Padma Raghavan:
Scalable parallel graph partitioning. 51:1-51:10
Inter-node communication
- George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally:
Channel reservation protocol for over-subscribed channels and destinations. 52:1-52:12 - Robert Gerstenberger, Maciej Besta, Torsten Hoefler:
Enabling highly-scalable remote memory access programming with MPI-3 one sided. 53:1-53:12 - Sreeram Potluri, Devendar Bureddy, Khaled Hamidouche, Akshay Venkatesh, Krishna Chaitanya Kandalla, Hari Subramoni, Dhabaleswar K. Panda:
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters. 54:1-54:11
Cloud resource management and scheduling
- Kefeng Deng, Junqiang Song, Kaijun Ren, Alexandru Iosup:
Exploring portfolio scheduling for long-term execution of scientific workloads in IaaS clouds. 55:1-55:12 - Shuangcheng Niu, Jidong Zhai, Xiaosong Ma, Xiongchao Tang, Wenguang Chen:
Cost-effective cloud HPC resource provisioning by building semi-elastic virtual clusters. 56:1-56:12 - Alok Gautam Kumbhare, Yogesh Simmhan, Viktor K. Prasanna:
Exploiting application dynamism and cloud elasticity for continuous dataflows. 57:1-57:12
Energy management
- Osman Sarood, Esteban Meneses, Laxmikant V. Kalé:
A 'cool' way of improving the reliability of HPC machines. 58:1-58:12 - Indrani Paul, Vignesh T. Ravi, Srilatha Manne, Manish Arora, Sudhakar Yalamanchili:
Coordinated energy management in heterogeneous processors. 59:1-59:12 - Xu Yang, Zhou Zhou, Sean Wallace, Zhiling Lan, Wei Tang, Susan Coghlan, Michael E. Papka:
Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems. 60:1-60:11
Extreme-scale applications
- Myoungkyu Lee, Nicholas Malaya, Robert D. Moser:
Petascale direct numerical simulation of turbulent channel flow on up to 786K cores. 61:1-61:11 - Iván Bermejo-Moreno, Julien Bodart, Johan Larsson, Blaise M. Barney, Joseph W. Nichols, Steve Jones:
Solving the compressible navier-stokes equations on up to 1.97 million cores and 4.1 trillion grid points. 62:1-62:10 - Peter Johnsen, Mark Straka, Melvyn Shapiro, Alan Norton, Thomas Galarneau:
Petascale WRF simulation of hurricane Sandy deployment of NCSA's cray XE6 blue waters. 63:1-63:7
Fault tolerance and migration in the cloud
- Sheng Di, Yves Robert, Frédéric Vivien, Derrick Kondo, Cho-Li Wang, Franck Cappello:
Optimization of cloud task processing with checkpoint-restart mechanism. 64:1-64:12 - Kaveh Razavi, Thilo Kielmann:
Scalable virtual machine deployment using VM image caches. 65:1-65:12 - Jihun Kim, Dongju Chae, Jangwoo Kim, Jong Kim:
Guide-copy: fast and silent migration of virtual machine for datacenters. 66:1-66:12
IO tuning
- Sidharth Kumar, Avishek Saha, Venkatram Vishwanath, Philip H. Carns, John A. Schmidt, Giorgio Scorzelli, Hemanth Kolla, Ray W. Grout, Robert Latham, Robert B. Ross, Michael E. Papka, Jacqueline Chen, Valerio Pascucci:
Characterization and modeling of PIDX parallel I/O for performance optimization. 67:1-67:12 - Babak Behzad, Huong Vu Thanh Luu, Joseph Huchette, Surendra Byna, Prabhat, Ruth A. Aydt, Quincey Koziol, Marc Snir:
Taming parallel I/O complexity with auto-tuning. 68:1-68:12 - Da Zheng, Randal C. Burns, Alexander S. Szalay:
Toward millions of file system IOPS on low-cost, commodity hardware. 69:1-69:12
Physical frontiers
- Yifeng Cui, Efecan Poyraz, Kim B. Olsen, Jun Zhou, Kyle Withers, Scott Callaghan, Jeff Larkin, Clark C. Guest, Dong Ju Choi, Amit Chourasia, Zheqiang Shi, Steven M. Day, Philip Maechling, Thomas H. Jordan:
Physics-based seismic hazard analysis on petascale heterogeneous supercomputers. 70:1-70:12 - Manaschai Kunaseth, Rajiv K. Kalia, Aiichiro Nakano, Ken-ichi Nomura, Priya Vashishta:
A scalable parallel algorithm for dynamic range-limited n-tuple computation in many-body molecular dynamics simulation. 71:1-71:12 - Michael S. Warren:
2HOT: an improved parallel hashed oct-tree n-body algorithm for cosmological simulation. 72:1-72:12
Optimizing data movement
- Joe B. Buck, Noah Watkins, Greg Levin, Adam Crume, Kleoni Ioannidou, Scott A. Brandt, Carlos Maltzahn, Neoklis Polyzotis, Aaron Torres:
SIDR: structure-aware intelligent data routing in Hadoop. 73:1-73:12 - Tong Jin, Fan Zhang, Qian Sun, Hoang Bui, Manish Parashar, Hongfeng Yu, Scott Klasky, Norbert Podhorszki, Hasan Abbasi:
Using cross-layer adaptations for dynamic data management in large scale coupled scientific workflows. 74:1-74:12 - Myoungsoo Jung, Ellis Herbert Wilson, Wonil Choi, John Shalf, Hasan Metin Aktulga, Chao Yang, Erik Saule, Ümit V. Çatalyürek, Mahmut T. Kandemir:
Exploring the future of out-of-core computing with compute-local non-volatile memory. 75:1-75:11
In-situ data analytics and reduction
- Daniel E. Laney, Steven Langer, Christopher Weber, Peter Lindstrom, Al Wegener:
Assessing the effects of data compression in simulations using physically motivated metrics. 76:1-76:12 - Marc Gamell, Ivan Rodero, Manish Parashar, Janine Bennett, Hemanth Kolla, Jacqueline Chen, Peer-Timo Bremer, Aaditya G. Landge, Attila Gyulassy, Patrick S. McCormick, Scott Pakin, Valerio Pascucci, Scott Klasky:
Exploring power behaviors and trade-offs of in-situ data analytics. 77:1-77:12 - Fang Zheng, Hongfeng Yu, Can Hantas, Matthew Wolf, Greg Eisenhauer, Karsten Schwan, Hasan Abbasi, Scott Klasky:
GoldRush: resource efficient in situ scientific data analytics using fine-grained interference aware execution. 78:1-78:12
Preconditioners and unstructured meshes
- James King, Robert M. Kirby:
A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes. 79:1-79:12 - Pierre Jolivet, Frédéric Hecht, Frédéric Nataf, Christophe Prud'homme:
Scalable domain decomposition preconditioners for heterogeneous elliptic problems. 80:1-80:11 - Long Qu, Laura Grigori, Frédéric Nataf:
Parallel design and performance of nested filtering factorization preconditioner. 81:1-81:12
Engineering scalable applications
- Bei Wang, Stéphane Ethier, William M. Tang, Timothy J. Williams, Khaled Z. Ibrahim, Kamesh Madduri, Samuel Williams, Leonid Oliker:
Kinetic turbulence simulations at extreme scale on leadership-class systems. 82:1-82:12 - Florian Wende, Thomas Steinke:
Swendsen-Wang multi-cluster algorithm for the 2D/3D Ising model on Xeon Phi and GPU. 83:1-83:12 - Benjamin Welton, Evan Samanas, Barton P. Miller:
Mr. Scan: extreme scale density-based clustering using a tree-based network of GPGPU nodes. 84:1-84:11
Improving large-scale computation and data resources
- Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, Jason Zurawski:
The Science DMZ: a network design pattern for data-intensive science. 85:1-85:10 - James C. Browne, Robert L. DeLeon, Charng-Da Lu, Matthew D. Jones, Steven M. Gallo, Amin Ghadersohi, Abani K. Patra, William L. Barth, John L. Hammond, Thomas R. Furlani, Robert T. McLay:
Enabling comprehensive data-driven system management for large computational facilities. 86:1-86:11 - Jay F. Lofstead, Robert Ross:
Insights for exascale IO APIs from building a petascale IO API. 87:1-87:12
Matrix computations
- Yulu Jia, George Bosilca, Piotr Luszczek, Jack J. Dongarra:
Parallel reduction to hessenberg form with algorithm-based fault tolerance. 88:1-88:11 - Oded Green, Yitzhak Birk:
A computationally efficient algorithm for the 2D covariance method. 89:1-89:12 - Azzam Haidar, Jakub Kurzak, Piotr Luszczek:
An improved parallel singular value algorithm and its implementation for multicore hardware. 90:1-90:12
Sorting and graph algorithms
- Md. Maksudul Alam, Maleq Khan, Madhav V. Marathe:
Distributed-memory parallel algorithms for generating massive scale-free networks using preferential attachment model. 91:1-91:12 - Sungpack Hong, Nicole C. Rodia, Kunle Olukotun:
On fast parallel detection of strongly connected components (SCC) in small-world graphs. 92:1-92:11 - Hari Sundar, Dhairya Malhotra, Karl W. Schulz:
Algorithms for high-throughput disk-to-disk sorting. 93:1-93:10
Application performance characterization
- Subhash Saini, Haoqiang Jin, Dennis C. Jespersen, Huiyu Feng, M. Jahed Djomehri, William Arasin, Robert Hood, Piyush Mehrotra, Rupak Biswas:
An early performance evaluation of many integrated core architecture based SGI rackable computing system. 94:1-94:12 - Nikhil Jain, Abhinav Bhatele, Michael P. Robson, Todd Gamblin, Laxmikant V. Kalé:
Predicting application performance using supervised learning on communication features. 95:1-95:12 - Qingyu Meng, Alan Humphrey, John A. Schmidt, Martin Berzins:
Investigating applications portability with the Uintah DAG-based runtime system on PetaScale supercomputers. 96:1-96:12
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.