Researcher profile

Sandor Kruk

Sandor Kruk contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2024arXiv

AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets

We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like GPT-4 excel in broader question-answering scenarios due to superior reasoning capabilities, our findings suggest that continual pre-training with limited resources can still enhance model performance on specialized topics. Additionally, we present an extension of AstroLLaMA: the fine-tuning of the 7B LLaMA model on a domain-specific conversational dataset, culminating in the release of the chat-enabled AstroLLaMA for community use. Comprehensive quantitative benchmarking is currently in progress and will be detailed in an upcoming full paper. The model, AstroLLaMA-Chat, is now available at https://huggingface.co/universeTBD, providing the first open-source conversational AI tool tailored for the astronomy community.

preprint2024arXiv

Hubble Asteroid Hunter III. Physical properties of newly found asteroids

Determining the size distribution of asteroids is key for understanding the collisional history and evolution of the inner Solar System. We aim at improving our knowledge on the size distribution of small asteroids in the Main Belt by determining the parallaxes of newly detected asteroids in the Hubble Space Telescope (HST) Archive and hence their absolute magnitudes and sizes. Asteroids appear as curved trails in HST images due to the parallax induced by the fast orbital motion of the spacecraft. The parallax effect can be computed to obtain the distance to the asteroids by fitting simulated trajectories to the observed trails. Using distance, we can obtain the object&#39;s absolute magnitude and size estimation assuming an albedo value, along with some boundaries for its orbital parameters. In this work we analyse a set of 632 serendipitously imaged asteroids found in the ESA HST Archive. An object-detection machine learning algorithm was used to perform this task during previous work. Our raw data consists of 1,031 asteroids trails from unknown objects (not matching any entries in the MPC database). We also found 670 trails from known objects (objects featuring matching entries in the MPC). After an accuracy assessment and filtering process, our analysed HST set consists of 454 unknown objects and 178 known objects. We obtain a sample dominated by potential Main Belt objects featuring absolute magnitudes (H) mostly between 15 and 22 mag. The absolute magnitude cumulative distribution confirms the previously reported slope change for 15 < H < 18, from 0.56 to 0.26, maintained in our case down to absolute magnitudes around H = 20, hence expanding the previous results by approximately two magnitudes. HST archival observations can be used as an asteroid survey since the telescope pointings are statistically randomly oriented in the sky and they cover long periods of time.

preprint2022arXiv

Galaxy Zoo DECaLS: Detailed Visual Morphology Measurements from Volunteers and Deep Learning for 314,000 Galaxies

We present Galaxy Zoo DECaLS: detailed visual morphological classifications for Dark Energy Camera Legacy Survey images of galaxies within the SDSS DR8 footprint. Deeper DECaLS images (r=23.6 vs. r=22.2 from SDSS) reveal spiral arms, weak bars, and tidal features not previously visible in SDSS imaging. To best exploit the greater depth of DECaLS images, volunteers select from a new set of answers designed to improve our sensitivity to mergers and bars. Galaxy Zoo volunteers provide 7.5 million individual classifications over 314,000 galaxies. 140,000 galaxies receive at least 30 classifications, sufficient to accurately measure detailed morphology like bars, and the remainder receive approximately 5. All classifications are used to train an ensemble of Bayesian convolutional neural networks (a state-of-the-art deep learning method) to predict posteriors for the detailed morphology of all 314,000 galaxies. When measured against confident volunteer classifications, the networks are approximately 99% accurate on every question. Morphology is a fundamental feature of every galaxy; our human and machine classifications are an accurate and detailed resource for understanding how galaxies evolve.

preprint2022arXiv

Galaxy Zoo: Clump Scout: Surveying the Local Universe for Giant Star-forming Clumps

Massive, star-forming clumps are a common feature of high-redshift star-forming galaxies. How they formed, and why they are so rare at low redshift, remains unclear. In this paper we identify the largest yet sample of clumpy galaxies (7,052) at low redshift using data from the citizen science project \textit{Galaxy Zoo: Clump Scout}, in which volunteers classified over 58,000 Sloan Digital Sky Survey (SDSS) galaxies spanning redshift $0.02 < z < 0.15$. We apply a robust completeness correction by comparing with simulated clumps identified by the same method. Requiring that the ratio of clump-to-galaxy flux in the SDSS $u$ band be greater than 8\% (similar to clump definitions used by other works), we estimate the fraction of local galaxies hosting at least one clump ($f_{clumpy}$) to be $2.68_{-0.30}^{+0.33}\%$. We also compute the same fraction with a less stringent cut of 3\% ($11.33_{-1.16}^{+0.89}\%$), as the higher number count and lower statistical noise of this fraction permits sharper comparison with future low-redshift clumpy galaxy studies. Our results reveal a sharp decline in $f_{clumpy}$ over $0 < z < 0.5$. The minor merger rate remains roughly constant over the same span, so we suggest that minor mergers are unlikely to be the primary driver of clump formation. Instead, the rate of galaxy turbulence is a better tracer for $f_{clumpy}$ over $0 < z < 1.5$ for galaxies of all masses, which supports the idea that clump formation is primarily driven by violent disk instability for all galaxy populations during this period.

preprint2022arXiv

Gems of the Galaxy Zoos -- a Wide-Ranging Hubble Space Telescope Gap-Filler Program

We describe the Gems of the Galaxy Zoos (Zoo Gems) project, a gap-filler project using short windows in the Hubble Space Telescope&#39;s schedule. As with previous snapshot programs, targets are taken from a pool based on position; we combine objects selected by volunteers in both the Galaxy Zoo and Radio Galaxy Zoo citizen-science projects. Zoo Gems uses exposures with the Advanced Camera for Surveys (ACS) to address a broad range of topics in galaxy morphology, interstellar-medium content, host galaxies of active galactic nuclei, and galaxy evolution. Science cases include studying galaxy interactions, backlit dust in galaxies, post-starburst systems, rings and peculiar spiral patterns, outliers from the usual color-morphology relation, Green Pea compact starburst systems, double radio sources with spiral host galaxies, and extended emission-line regions around active galactic nuclei. For many of these science categories, final selection of targets from a larger list used public input via a voting process. Highlights to date include the prevalence of tightly-wound spiral structure in blue, apparently early-type galaxies, a nearly complete Einstein ring from a group lens, redder components at lower surface brightness surrounding compact Green Pea starbursts, and high-probability examples of spiral galaxies hosting large double radio sources.

preprint2022arXiv

Hubble Asteroid Hunter: I. Identifying asteroid trails in Hubble Space Telescope images

Large and publicly available astronomical archives open up new possibilities to search and study Solar System objects. However, advanced techniques are required to deal with the large amounts of data. These unbiased surveys can be used to constrain the size distribution of minor bodies, which represents a piece of the puzzle for the formation models of the Solar System. We aim to identify asteroids in archival images from the ESA Hubble Space Telescope (HST) Science data archive using data mining. We developed a citizen science project on the Zooniverse platform, Hubble Asteroid Hunter (www.asteroidhunter.org) asking members of the public to identify asteroid trails in archival HST images. We used the labels provided by the volunteers to train an automated deep learning model built with Google Cloud AutoML Vision to explore the entire HST archive to detect asteroids crossing the field-of-view. We report the detection of 1701 new asteroid trails identified in archival HST data via our citizen science project and the subsequent machine learning exploration of the ESA HST science data archive. We detect asteroids to a magnitude of 24.5, which are statistically fainter than the populations of asteroids identified from ground-based surveys. The majority of asteroids are distributed near the ecliptic plane, as expected, where we find an approximate density of 80 asteroids per square degree. We match 670 trails (39% of the trails found) with 454 known Solar System objects in the Minor Planet Center database, however, no matches are found for 1031 (61%) trails. The unidentified asteroids are faint, being on average 1.6 magnitudes fainter than the asteroids we succeeded to identify. They probably correspond to previously unknown objects. This work demonstrates that citizen science and machine learning are useful techniques for the systematic search of SSOs in existing astronomy science archives.

preprint2022arXiv

Practical Galaxy Morphology Tools from Deep Supervised Representation Learning

Astronomers have typically set out to solve supervised machine learning problems by creating their own representations from scratch. We show that deep learning models trained to answer every Galaxy Zoo DECaLS question learn meaningful semantic representations of galaxies that are useful for new tasks on which the models were never trained. We exploit these representations to outperform several recent approaches at practical tasks crucial for investigating large galaxy samples. The first task is identifying galaxies of similar morphology to a query galaxy. Given a single galaxy assigned a free text tag by humans (e.g. &#34;#diffuse&#34;), we can find galaxies matching that tag for most tags. The second task is identifying the most interesting anomalies to a particular researcher. Our approach is 100% accurate at identifying the most interesting 100 anomalies (as judged by Galaxy Zoo 2 volunteers). The third task is adapting a model to solve a new task using only a small number of newly-labelled galaxies. Models fine-tuned from our representation are better able to identify ring galaxies than models fine-tuned from terrestrial images (ImageNet) or trained from scratch. We solve each task with very few new labels; either one (for the similarity search) or several hundred (for anomaly detection or fine-tuning). This challenges the longstanding view that deep supervised methods require new large labelled datasets for practical use in astronomy. To help the community benefit from our pretrained models, we release our fine-tuning code Zoobot. Zoobot is accessible to researchers with no prior experience in deep learning.

preprint2020arXiv

Galaxy Zoo Builder: Four Component Photometric decomposition of Spiral Galaxies Guided by Citizen Science

Multi-component modelling of galaxies is a valuable tool in the effort to quantitatively understand galaxy evolution, yet the use of the technique is plagued by issues of convergence, model selection and parameter degeneracies. These issues limit its application over large samples to the simplest models, with complex models being applied only to very small samples. We attempt to resolve this dilemma of &#34;quantity or quality&#34; by developing a novel framework, built inside the Zooniverse citizen science platform, to enable the crowdsourcing of model creation for Sloan Digitial Sky Survey galaxies. We have applied the method, including a final algorithmic optimisation step, on a test sample of 198 galaxies, and examine the robustness of this new method. We also compare it to automated fitting pipelines, demonstrating that it is possible to consistently recover accurate models that either show good agreement with, or improve on, prior work. We conclude that citizen science is a promising technique for modelling images of complex galaxies, and release our catalogue of models.

preprint2019arXiv

The HI Morphology and Stellar Properties of Strongly Barred Galaxies: support for bar quenching in massive spirals

Galactic bars can affect the evolution of galaxies by redistributing gas in galaxies, possibly contributing to the cessation of star formation. Recent works point to &#39;bar quenching&#39; playing an important role in massive disk galaxies like the Milky Way. We construct the largest ever sample of gas rich and strongly barred disc galaxies with resolved HI observations making use of both the Giant Meter Radio Telescope (GMRT) and the Karl Jansky Very Large Array (VLA) to collect data. This sample, called HIRB (HI Rich Barred) galaxies, were identified with Galaxy Zoo - to find galaxies hosting a strong bar, and the Arecibo Legacy Fast Arecibo L-band Feed Array (ALFALFA) blind HI survey- to identify a high HI content. We measure gas fractions, HI morphology and kinematics in each galaxy, and use archival optical data from the Sloan Digital Sky Survey (SDSS) to reveal star-formation histories and bar properties. HIRB galaxies presented here support a picture where bar quenching is playing, or will play an important role in their evolution. They also support models which show how the presence of cold gas delays and slows the development of strong bars. The galaxies with the lowest gas fractions (still high for their mass) show clear HI holes, dynamical advanced bars and low star formation rates, those with the highest gas fractions show little impact from their bar on the HI morphology, and are actively star-forming. How such unusual galaxies came to be is an open question. Several of the HIRBs have local gas rich companions. Tidal interactions with these lower mass galaxies could result in an early triggering of the bar and/or accretion of HI between them. The role of environment in the evolution of the HIRB galaxies will be explored in a future paper.