Log In
New user? Click here to register. Have you forgotten your password?
NC State University Libraries Logo
    Communities & Collections
    Browse NC State Repository
Log In
New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Browse by Author

Browsing by Author "Dr. Vincent Freeh, Committee Member"

Filter results by typing the first few letters
Now showing 1 - 5 of 5
  • Results Per Page
  • Sort Options
  • No Thumbnail Available
    Bluetooth Intrusion Detection
    (2008-04-15) OConnor, Terrence; Dr. Douglas Reeves, Committee Chair; Dr. Peng Ning, Committee Member; Dr. Vincent Freeh, Committee Member
  • No Thumbnail Available
    Buddy Threading in Distributed Applications on Simultaneous Multi-Threading Processors
    (2005-04-19) Vouk, Nikola; Dr. Michael Rappa, Committee Member; Dr. Frank Mueller, Committee Chair; Dr. Vincent Freeh, Committee Member
    Modern processors provide a multitude of opportunities for instruction-level parallelism that most current applications cannot fully utilize. To increase processor core execution efficiency, modern processors can execute instructions from two or more tasks simultaneously in the functional units in order to increase the execution rate of instructions per cycle (IPC). These processors implement simultaneous multi-threading (SMT), which increases processor efficiency through thread-level parallelism, but problems can arise due to cache conflicts and CPU resource starvation. Consider high end applications typically running on clusters of commodity computers. Each compute node is sending, receiving and calculating data for some application. Non-SMT processors must compute data, context switch, communicate that data, context switch, compute more data, and so on. The computation phases often utilize floating point functional units while integer functional units for communication. Until recently, modern communication libraries were not able to take complete advantage of this parallelism due to the lack of SMT hardware. This thesis explores the feasibility of exploiting this natural compute/communicate parallelism in distributed applications, especially for applications that are not optimized for the constraints imposed by SMT hardware. This research explores hardware and software thread synchronization primitives to reduce inter-thread communication latency and operating system context switch time in order to maximize a program's ability to compute and communicate simultaneously. This work investigates the reduction of inter-thread communication through hardware synchronization primitives. These primitives allow threads to 'instantly' notify each other of changes in program state. We also describe a thread-promoting buddy scheduler that allows threads to always be co-scheduled together, thereby providing an application the exclusive use of all processor resources, reducing context switch overhead, inter-thread communication latency and scheduling overhead. Finally, we describe the design and implementation of a modified MPI over Channel (MPICH) MPI library that allows legacy applications to take advantage of SMT processor parallelism. We conclude with an evaluation of these techniques using several ASCI benchmarks. Overall, we show that compute-communicate application performance can be further improved by taking advantage of the native parallelism provided by SMT processors. To fully exploit this advantage, these applications must be written to overlap communication with computation as much as possible.
  • No Thumbnail Available
    Instruction Cache Checkpoints Using Phase Tracking and Prediction
    (2005-12-30) Mukundan, Janani; Dr. Edward Davis, Committee Member; Dr. Vincent Freeh, Committee Member; Dr. Paul D. Franzon, Committee Chair
    The Memory wall is standing taller than ever. There is an ever growing imbalance between memory bandwidth and processor speeds. Due to these diverging rates most applications are limited by memory performance. Various aggressive techniques to hide memory latency have done little to hide this gap. Clearly, we will need better optimization techniques to bridge the gap between processor and memory speeds. In future it will be necessary for us to understand program patterns and behavior at run time, so that we can efficiently utilize various optimization techniques. Past research [10] has suggested that program's tend to have cyclic patterns of execution. They tend to execute in phases, which repeat over time. It is possible to efficiently capture, classify and predict phase based program behavior at run time [13]. We propose using Phase Tracking and Prediction to bridge the memory gap. We introduce the concept of Instruction Cache Checkpoints that exploit program behavior to prefetch into the Instruction Cache. The intuition behind this scheme is that since phase behavior can be predicted, we can effectively pre-fetch instructions according to phase transitions. We also propose a new improved Phase Prediction architecture based on phase run-lengths. We begin by studying and evaluating phase behavior in SPEC2k FP benchmarks. The observed phase behavior is then exploited by creating Instruction Cache Checkpoints that use prefetching based on phase changes. Detailed simulation of five of the SPEC 2k FP benchmarks show that using Instruction Cache Checkpoints gives us an average reduction of 17.8% in the number of Instruction Cache misses.
  • No Thumbnail Available
    Scalable, Fault-Tolerant Membership for Group Communication on HPC Systems
    (2006-04-23) Varma, Jyothish S; Dr. Tao Xie, Committee Member; Dr. Vincent Freeh, Committee Member; Dr. Frank Mueller, Committee Chair
    Reliability is increasingly becoming a challenge for high-performance computing (HPC) systems with thousands of nodes, such as IBM's Blue Gene/L. A shorter mean-timeto-failure can be addressed by adding fault tolerance to reconfigure working nodes to ensure that communication and computation can progress. However, existing approaches fall short in providing scalability and small reconfiguration overhead within the fault-tolerant layer. This thesis presents a scalable approach to reconfigure the communication infrastructure after node failures. We propose a decentralized (peer-to-peer) protocol that maintains a consistent view of active nodes in the presence of faults. Our protocol shows response time in the order of hundreds of microseconds and single-digit milliseconds for reconfiguration using MPI over BlueGene/L and TCP over Gigabit, respectively. The protocol can be adapted to match the network topology to further increase performance. We also verify experimental results against a performance model, which demonstrates the scalability of the approach. Hence, the membership service is suitable for deployment in the communication layer of MPI runtime systems.
  • No Thumbnail Available
    STI Friendly Clock Recovery Techniques for Bit Banged Communication Protocols
    (2004-05-19) Thirumoolan, Sudhagar; Dr. Alexander Dean, Committee Chair; Dr. Frank Mueller, Committee Co-Chair; Dr. Vincent Freeh, Committee Member
    Nowadays embedded communication networks are used in a number of applications. A majority of these networks use a shared medium for communication. Controllers for these shared medium protocols can be implemented in hardware or software. Protocol implementations in software using standard off the shelf microcontrollers have been found to be faster, easier and most cost effective. In an embedded communication network, it is essential that the protocol controllers are in synchrony with each other. A slight phase difference between the clocks of the protocol controllers can lead to errors in the data propagated between them. If the sender has a slight error in its clock frequency the receivers that are reading a message from the sender need to adapt to the clock of the sender by recovering the clock information from the data on the shared medium. This paper talks about the various existing techniques in hardware and software and compares a few proposed clock recovery techniques which can be used for software protocol implementations using STI. A good clock recovery technique must adapt to the sender as quickly as possible and maintain very little phase difference with the sender. It must consume very few computational cycles and the code blocks must remain as close together as possible to avoid fragmenting the idle time of the protocol implementation. In order to facilitate STI, a good primary thread implementation will need to have large chunks of idle time which can be merged with useful code from a secondary thread. It is also required that the primary thread should have fixed or almost fixed execution times for all its functions, in order to be able to recover as much idle time as possible. The proposed techniques have large chunks of idle time as well as try to catch up with the sender's clock as quickly as possible. Hence they are good candidates for being used in software protocol implementations using STI. However, the limitation of these techniques is the fixed number of cycles up to which they can recover in every bit interval when the receiver's clock is slower than the sender's.

Contact

D. H. Hill Jr. Library

2 Broughton Drive
Campus Box 7111
Raleigh, NC 27695-7111
(919) 515-3364

James B. Hunt Jr. Library

1070 Partners Way
Campus Box 7132
Raleigh, NC 27606-7132
(919) 515-7110

Libraries Administration

(919) 515-7188

NC State University Libraries

  • D. H. Hill Jr. Library
  • James B. Hunt Jr. Library
  • Design Library
  • Natural Resources Library
  • Veterinary Medicine Library
  • Accessibility at the Libraries
  • Accessibility at NC State University
  • Copyright
  • Jobs
  • Privacy Statement
  • Staff Confluence Login
  • Staff Drupal Login

Follow the Libraries

  • Facebook
  • Instagram
  • Twitter
  • Snapchat
  • LinkedIn
  • Vimeo
  • YouTube
  • YouTube Archive
  • Flickr
  • Libraries' news

ncsu libraries snapchat bitmoji

×