Researcher profile

Sorin Cotofana

Sorin Cotofana contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Would Magnonic Circuits Outperform CMOS Counterparts?

In the early stages of a novel technology development, it is difficult to provide a comprehensive assessment of its potential capabilities and impact. Nevertheless, some preliminary estimates can be drawn and are certainly of great interest and in this paper we follow this line of reasoning within the framework of the Spin Wave (SW) computing paradigm. In particular, we are interested in assessing the technological development horizon that needs to be reached in order to unleash the full SW paradigm potential such that SW circuits can outperform CMOS counterparts in terms of energy consumption. In view of the zero power SWs propagation through ferromagnetic waveguides, the overall SW circuit power consumption is determined by the one associated to SWs generation and sensing by means of transducers. While current antenna based transducers are clearly power hungry recent developments indicate that magneto-electric (ME) cells have a great potential for ultra-low power SW generation and sensing. Given that MEs have been only proposed at the conceptual level and no actual experimental demonstration has been reported we cannot evaluate the impact of their utilization on the SW circuit energy consumption. However, we can perform a reverse engineering alike analysis to determine ME delay and power consumption upper bounds that can place SW circuits in the leading position. To this end, we utilize a 32-bit Brent-Kung Adder (BKA) as discussion vehicle and compute the maximum ME delay and power consumption that could potentially enable a SW implementation able to outperform its 7nm CMOS counterpart. We evaluate different BKA SW implementations that rely on conversion or normalization gate cascading and consider continuous or pulsed SW generation scenarios. 31nW is the maximum transducer power consumption for which a 32-bit BKA SW implementation can outperform its 7nm CMOS counterpart.

preprint2020arXiv

Evolutionary Bin Packing for Memory-Efficient Dataflow Inference Acceleration on FPGA

Convolutional neural network (CNN) dataflow inference accelerators implemented in Field Programmable Gate Arrays (FPGAs) have demonstrated increased energy efficiency and lower latency compared to CNN execution on CPUs or GPUs. However, the complex shapes of CNN parameter memories do not typically map well to FPGA on-chip memories (OCM), which results in poor OCM utilization and ultimately limits the size and types of CNNs which can be effectively accelerated on FPGAs. In this work, we present a design methodology that improves the mapping efficiency of CNN parameters to FPGA OCM. We frame the mapping as a bin packing problem and determine that traditional bin packing algorithms are not well suited to solve the problem within FPGA- and CNN-specific constraints. We hybridize genetic algorithms and simulated annealing with traditional bin packing heuristics to create flexible mappers capable of grouping parameter memories such that each group optimally fits FPGA on-chip memories. We evaluate these algorithms on a variety of FPGA inference accelerators. Our hybrid mappers converge to optimal solutions in a matter of seconds for all CNN use-cases, achieve an increase of up to 65% in OCM utilization efficiency for deep CNNs, and are up to 200$\times$ faster than current state-of-the-art simulated annealing approaches.