Researcher profile

Borivoje Nikolic

Borivoje Nikolic contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

Rethinking Network Topologies for Cost-Effective Mixture-of-Experts LLM Serving

Mixture-of-experts (MoE) architectures have turned LLM serving into a cluster-scale workload in which communication consumes a considerable portion of LLM serving runtime. This has prompted industry to invest heavily in expensive high-bandwidth scale-up networks. We question whether such costly infrastructure is strictly necessary. We present the first systematic cross-layer analysis of network cost-effectiveness for MoE LLM serving, comparing four representative XPU (e.g., GPU/TPU) topologies (scale-up, scale-out, 3D torus, and 3D full-mesh). We find that lower-cost switchless topologies are more cost-effective than the scale-up topology across all serving scenarios explored, improving cost-effectiveness by 20.6-56.2%. In particular, the 3D full-mesh topology is Pareto-optimal in terms of the performance-cost tradeoff. We also find that current scale-up link bandwidths are over-provisioned: reducing the link bandwidth improves throughput per cost by up to 27%. A forward-looking analysis of upcoming GPU generations indicates that the cost-performance advantage of switchless networks will likely persist.

preprint2020arXiv

AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs

Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs). While digital subsystems have design flows that are conducive to rapid iterations from specification to layout, analog and mixed-signal modules face the challenge of a long human-in-the-middle iteration loop that requires expert intuition to verify that post-layout circuit parameters meet the original design specification. Existing automated solutions that optimize circuit parameters for a given target design specification have limitations of being schematic-only, inaccurate, sample-inefficient or not generalizable. This work presents AutoCkt, a machine learning optimization framework trained using deep reinforcement learning that not only finds post-layout circuit parameters for a given target specification, but also gains knowledge about the entire design space through a sparse subsampling technique. Our results show that for multiple circuit topologies, AutoCkt is able to converge and meet all target specifications on at least 96.3% of tested design goals in schematic simulation, on average 40X faster than a traditional genetic algorithm. Using the Berkeley Analog Generator, AutoCkt is able to design 40 LVS passed operational amplifiers in 68 hours, 9.6X faster than the state-of-the-art when considering layout parasitics.