Publications
Predicting Execution Time of CUDA Kernel Using Static Analysis
Gargi Alavani, Kajal Varma, Santonu Sarkar
ISPA 2018
GPU-based parallel applications are becoming complex and long-running which makes it energy inefficient. Anticipating execution time can help the developers to fix inefficient code before running it. We propose an approach to predict the execution time of a GPU kernel without the need of executing it. We build an analytical model to predict the execution time of a GPU kernel by analyzing the intermediate PTX code of a CUDA kernel.
View Publication
DTLB: Deterministic TLB for Tightly Bound Hard Real-time Systems
Kajal Varma, Geeta Patil, Biju Raveendran
VLSID 2017
This paper proposes a novel TLB architecture - Deterministic Translation Lookaside Buffer – to reduce TLB misses, energy consumption and effective per-access time. This is achieved by backing the TLB contents of the executing task to its PCB on preemption, and transferring the PCB contents back to the TLB when the task resumes execution. Experimental results carried out using MemSim, a single clock cycle simulator developed in Java with Swing GUI, show that the DTLB offers on an average 6.74% of dynamic energy savings over a conventional TLB model.
View Publication
Projects
Independent Projects/Research
Energy Estimation of High-Performance Computing Applications
Research thesis on prediction of energy consumption of an NVIDIA CUDA kernel through static analysis of compiled PTX code and power modelling of benchmarks on an NVIDIA GPU. Paper published in ISPA 2018.
DTLB: Deterministic TLB for Tightly Bound Hard Real-Time Systems
Deterministic translation lookaside buffer (DTLB) and cache design and simulation for hard real-time systems, to eliminate inter-task interference and obtain dynamic energy savings. Paper published in IEEE VLSID 2017.
Deterministic Process-Aware Partitioned Cache for Tightly Bound Hard Real-Time Systems
Worked on a research project in the area of real-time operating systems, proposing a cache memory design that eliminates inter-task interference.
Design of Scheduler and Memory for a Real-Time System
Implemented an Earliest Deadline First Scheduler with Stack Resource Policy (EDF-SRP) in C for process scheduling and memory management in a real-time operating system. Implemented MemSim, a memory simulator GUI in Java with a stack, TLB, and cache, to simulate the memory footprint of the schedule generated by EDF-SRP.
Mobile Edge Computing Packet Scheduling
Added support for simulation of Mobile Edge Computing (MEC) networks in NS-3 by implementing custom C++ modules. Added functionality to configure execution of several experiments in order to easily collect and plot data for research purposes.
Graduate Course Projects
PromotEd
GitHub
Built a visual interface which recommends courses from multiple online course providers based on desired job roles. Used Python for machine learning algorithms, React, and shell scripts to collect data from MOOC APIs.
Pomodoro Time Tracker Web Application
GitHub
Built the web application front-end for a time-tracker productivity application using React.
MapReduce Infrastructure using gRPC
Implemented a MapReduce simulation in C++ by using gRPC for communication in a distributed service.
vCPU Scheduler and Memory Coordinator for Virtual Machines
Implementation of a scheduler and memory coordinator to dynamically manage resources assigned to each guest OS running on a hypervisor in a virtualized setting in C.
Course Recommendation System
Built a course recommendation website for Georgia Tech using student reviews, performed sentiment analysis on the reviews. Compared user-based and item-based collaborative filtering, and content-based recommendation.
Simulation of Delay Tolerant Networks
Simulated a delay tolerant network using the ONE (Opportunistic Network Environment) simulator, and a TCP-network on NS-3 simulator.
Analysis of Cache Replacement Policy using SESC Simulator
Implemented NXLRU (Next to Least Recently Used) cache replacement policy in the SuperESCalar Simulator. Gathered data on branch prediction accuracy and performance in an out-of-order processor. Classified misses as compulsory, conflict, capacity and coherence misses in the cache.
Undergraduate Course Projects
Raspberry Pi Surveillance Camera with Motion Detection and Android Application
GitHub
Implemented motion detection, montage creation and email notifications with a Raspberry Pi and Pi Camera, which could be controlled through an Android application.
Linux Kernel Mouse Device Driver Implementation
GitHub
Implementation of a kernel device driver module in C to change the brightness of the screen through mouse clicks.
Simulation of DiskSim disk scheduling algorithms
GitHub
DiskSim is a hard disk simulation software used for I/O analysis research. Implemented a Java simulation of two scheduling algorithms - cyclic cylindrical access, and shortest access time first. Collected disk usage data of PC using IOMeter, an I/O subsystem measurement and characterization tool for single and clustered systems, as a study of disk performance.
Phased Cache Design and Implementation
GitHub
Simulated an eight-way associative cache of 512B size and 16B line size using Verilog in ModelSim Altera. Implemented write back policy and FIFO replacement policy in the cache.
NVML GPU Power Management Module
GitHub
Built a module which helps to measure the average power level attained by an NVIDIA GPU during the execution of a CUDA kernel, especially useful for microbenchmarking instructions.
Hackathons
TurboTax Sam
Worked in a team of four to develop a chatbot for HackUtsav 2017, an internal hackathon held at Intuit. The chatbot provides quick and easy answers to customers from within TurboTax Windows. Emerged the winners of the hackathon.
VoicePay
Worked in a team of two in the In24Hrs Hackathon held at Intuit. Built a prototype of an Android application that enables the visually impaired to manage their bills through voice.