Site update: added Yandex and Twitter login support to the comment system.

Well, I've been testing lots of different hydraulic erosion simulation code on my noise based terrain. I started off with a pretty complex sim I put together that modeled water sources, water quantity, water velocity, amount of sediment carried in the water, soil deposition rates, etc. I tested this on 8000x8000 maps of f32 data.

Even though my code is fully threaded (current machine: 8 core/16 thread), my simulations were slow. Incredibly slow. Like possibly approaching the heat death of the universe for my simulation to end slow. After several iterations of simplifying the simulation I had a realization. I don't need some crazy full map erosion process that takes forever. I mostly want mountains that don't suck and realistic looking rivers. After I get that nailed I can get to work implementing terrain for specific biomes.

My new simplified river plan is this, for each river:

Starting point is a mountain peak which receives precipitation
Chart a path downhill
Widen river as conditions are met
End the river at an existing body of water, a depression to form a lake, or the edge of map
Also, erosion only at this time
When water physics is added in I'll add evaporation and take another look at sediment deposits
A modest speed increase: I was sharing one map between 16 threads and using reader/writer locking semantics. I didn't measure contention, but I suspected a fair amount. So now I create 16 copies of the map, give each thread their own version to work on, and merge the maps back together at the end. I got around a 10% performance boost. RAM is meant to be used right? ....right?

I'm not going to traverse every single tile/pixel downhill. First off, it's friggin' slow. Secondly, it produces some weird shaped rivers. That can be fixed by running a gazillion iterations, but that's one of the things I'm trying to avoid. So instead I'm going to search for the lowest point in a set radius, draw a line to it, and keep doing that until I have a river of attached line segments.

First I need to be able to draw a line from x1, y1 to x2, y2, preferably as quickly as possible. Luckily for me, some guy named Jack Elton Bresenham came up with a fast algorithm called Bresenham's line algorithm back in 1962.

This page has some cool examples of Bresenham's algorithm in C. (for some reason uBlock hates this page now, but as of this blog post date that page is safe and ad-free)

Here's my Rust implementation based on the code from that page:

// adapted from http://members.chello.at/%7Eeasyfilter/bresenham.html
#[inline]
pub fn get_line_unchecked(from: (usize, usize), to: (usize, usize)) -> Vec<(usize, usize)> {
    let mut line: Vec<(usize, usize)> = vec![];
    let mut ix0 = from.0 as isize;
    let mut iy0 = from.1 as isize;
    let ix1 = to.0 as isize;
    let iy1 = to.1 as isize;
    let dx: isize = ix1.abs_diff(ix0) as isize;
    let mut sx: isize = 1;
    if ix0 > ix1 {
        sx = -1;
    }
    let dy: isize = -(iy1.abs_diff(iy0) as isize);
    let mut sy: isize = 1;
    if iy0 > iy1 {
        sy = -1
    }
    let mut err = dx + dy;
    let mut e2: isize = 0;
    loop {
        line.push((ix0 as usize, iy0 as usize));
        if ix0 == ix1 && iy0 == iy1 {
            break;
        }
        e2 = 2 * err;
        if e2 >= dy {
            err += dy;
            ix0 += sx;
        }
        if e2 <= dx {
            err += dx;
            iy0 += sy;
        }
    }
    line
}

I called it get_line_unchecked because it's naive. I intend to draw lines on a specific canvas/map size within only the positive coordinate plane. This function doesn't know that. It just gives me a vector of points from one point to the other. Since I intend to use each of these points as a direct memory index into my map I want to make sure I return valid points. If a line extends off my map and I try to use that as a map index I'm going to experience 1 of 2 possible errors: Best case scenario, I index the wrong location on the map and modify something where I didn't intend to. Worst case scenario is I attempt to index beyond my maximum map size, which results in an array out of bounds error, or memory access exception, or whatever happens in Rust...and then the application crashes. Also, this function uses signed integers (to support the entire coordinate plane), so if I get a negative coordinate and cast it to an unsigned integer for my map index one of those 2 possible errors I just wrote about is going to happen. It's definitely more likely to be the crashy kind of error though.

So I updated the code a bit. I still use that previous code when I know it's safe and I want extra speed. Otherwise I use this one:

// adapted from http://members.chello.at/%7Eeasyfilter/bresenham.html
/// if remove_out_of_bounds = false, points outside of map will be placed on edge of map
/// if remove_out_of_bounds = true, points outside of map will be ignored
#[inline]
pub fn get_line(
    from: (usize, usize),
    to: (usize, usize),
    map_width: usize,
    remove_out_of_bounds: bool,
) -> Vec<(usize, usize)> {
    let mut line: Vec<(usize, usize)> = vec![];
    let mut ix0 = from.0 as isize;
    let mut iy0 = from.1 as isize;
    let ix1 = to.0 as isize;
    let iy1 = to.1 as isize;
    let dx: isize = ix1.abs_diff(ix0) as isize;
    let mut sx: isize = 1;
    let imap_width = map_width as isize;
    let mut clamped: bool = false;
    let mut try_clamp = |coord: isize| -> usize {
        if coord > imap_width - 1 {
            clamped = true;
            map_width - 1
        } else if coord < 0 {
            clamped = true;
            0
        } else {
            coord as usize
        }
    };

    if ix0 > ix1 {
        sx = -1;
    }
    let dy: isize = -(iy1.abs_diff(iy0) as isize);
    let mut sy: isize = 1;
    if iy0 > iy1 {
        sy = -1
    }
    let mut err = dx + dy;
    let mut e2: isize = 0;
    let mut skip_point: bool;
    loop {
        skip_point = false;
        if remove_out_of_bounds
            && (ix0 > imap_width - 1 || ix0 < 0 || iy0 > imap_width - 1 || iy0 < 0)
        {
            skip_point = true;
        }
        if !skip_point {
            line.push((try_clamp(ix0), try_clamp(iy0)))
        };
        if ix0 == ix1 && iy0 == iy1 {
            break;
        }
        e2 = 2 * err;
        if e2 >= dy {
            err += dy;
            ix0 += sx;
        }
        if e2 <= dx {
            err += dx;
            iy0 += sy;
        }
    }
    if !remove_out_of_bounds && clamped {
        line.sort();
        line.dedup();
    }
    line
}

My points are usize tuples because for better or worse, Rust forces slice indexes to be usize... and I use my points as indexes. I only have a width parameter because my game maps are square. I use the try_clamp closure so that I can keep track if any points have extended beyond the map. I keep track because if I'm not removing out of bounds map points and placing them on the map edge I could end up with duplicate points. The whole reason I want these points is so I can manipulate those parts of the map. I don't want to perform those manipulations multiple times at the same location on accident. Therefore, if any point locations have been clamped, I know that I need to sort my vector and remove possible duplicates.

Here are some totally random lines representing a sort of rudimentary river:

line_grid_scaled

Okay, next thing I need to do is figure out how to select a circle of pixels/tiles. For every point along my "river", depending on how wide the river is, I want to select a circle of tiles, and erode earth from that circle, eroding less earth the further out from the center of the circle that I go. I sort of picture it like dragging an ice cream cone shape along the path of the river, carving out the earth using that shape.

On that page I shared earlier there's a version of Bresenham's algorithm for circles. Here's my Rust version for a filled circle:

#[inline]
//adapted from http://members.chello.at/%7Eeasyfilter/bresenham.html
pub fn get_full_circle(xm: usize, ym: usize, mut r: isize, width: usize) -> Vec<(usize, usize)> {
    let xm = xm as isize;
    let ym = ym as isize;
    //let mut r = r as isize;
    let width2 = width as isize;

    let mut x = -r;
    let mut y = 0;
    let mut err: isize = 2 - 2 * r;
    let mut empty_circle: Vec<(usize, usize)> = vec![];
    let mut full_circle: Vec<(usize, usize)> = vec![];
    let mut clamped: bool = false;
    // need to know if clamp has ocurred, as this can create duplicate coordinates
    let mut try_clamp = |coord: isize| -> usize {
        if coord < 0 {
            clamped = true;
            0
        } else if coord >= width2 {
            clamped = true;
            width - 1
        } else {
            coord as usize
        }
    };
    while x < 0 {
        empty_circle.push((try_clamp(xm - x), try_clamp(ym + y)));
        empty_circle.push((try_clamp(xm - y), try_clamp(ym - x)));
        empty_circle.push((try_clamp(xm + x), try_clamp(ym - y)));
        empty_circle.push((try_clamp(xm + y), try_clamp(ym + x)));
        r = err;
        if r <= y {
            y += 1;
            err += y * 2 + 1;
        }
        if r > x || err > y {
            x += 1;
            err += x * 2 + 1;
        }
    }
    if empty_circle.len() == 0 {
        return empty_circle;
    }
    // sort by X axis. this allows easy dupe removal and quickly getting coords inside the circle
    empty_circle.sort();
    if clamped {
        empty_circle.dedup();
    }
    let mut lowest = usize::MAX;
    let mut highest = 0;
    let mut current = empty_circle.first().unwrap().0;
    //find lowest Y and highest Y for each X coordinate, and grab all the coords between (inclusive)
    for p in empty_circle {
        if p.0 != current {
            for y in lowest..=highest {
                full_circle.push((current, y));
            }
            lowest = usize::MAX;
            highest = 0;
            current = p.0;
        }
        if p.1 < lowest {
            lowest = p.1;
        }
        if p.1 > highest {
            highest = p.1;
        }
    }
    for y in lowest..=highest {
        full_circle.push((current, y));
    }
    full_circle
}

This code looks quite a bit different because I want a filled circle, not just a circle outline. Basically for each horizontal (X) step across the circle diameter I make, I fill in the whole range of vertical (Y) pixels. You could do it the other way around too, of course. I think it could be optimized a bit, but it works. Oh yeah, and like before I've got my clamping code in there in case I try to get a filled circle too close to the map edge.

Here's an example:

bresenham_filled_circle

I'm going to be drawing lots of circles though, and I wasn't sure if that was the fastest way to get a filled circle. So I wrote a more naive version. Basically, if I want a filled circle of radius R, first get the tiles/pixels in a square 2R X 2R centered where I want the circle. Then calculate the distance of each pixel from the center. If it's less than or equal to the circle radius I want, then it's inside the circle.

Here's my code for that version:

#[inline]
pub fn get_full_circle_naive(
    xm: usize,
    ym: usize,
    r: f32,
    width: usize,
    with_nipples: bool,
    ignore_center: bool,
) -> Vec<(usize, usize)> {
    let mut full_circle: Vec<(usize, usize)> = vec![];
    let mut nipples: Vec<(usize, usize)> = vec![];
    let startx: usize;
    let starty: usize;
    let endx: usize;
    let endy: usize;
    let r2 = r as usize;
    if xm >= r2 {
        startx = xm - r2;
    } else {
        startx = 0;
    }
    if ym >= r2 {
        starty = ym - r2;
    } else {
        starty = 0;
    }
    if width > xm + r2 {
        endx = xm + r2
    } else {
        endx = width - 1;
    }
    if width > ym + r2 {
        endy = ym + r2;
    } else {
        endy = width - 1;
    }
    let mut dist: f32;
    for x in startx..=endx {
        for y in starty..=endy {
            if ignore_center && x == xm && y == ym {
                continue;
            }
            dist = distance((x, y), (xm, ym));
            if dist < r {
                full_circle.push((x, y));
            } else if with_nipples && dist == r {
                nipples.push((x, y));
            }
        }
    }
    if with_nipples {
        full_circle.append(&mut nipples);
    }
    full_circle
}

Oh right, and my terribly complex distance function:

#[inline(always)]
pub fn distance(pos1: (usize, usize), pos2: (usize, usize)) -> f32 {
    let x1 = pos1.0 as f32;
    let x2 = pos2.0 as f32;
    let y1 = pos1.1 as f32;
    let y2 = pos2.1 as f32;
    ((x1 - x2).powi(2) + (y1 - y2).powi(2)).sqrt()
}

Here's what it looks like rendered:

naive_filled_circle_with_nipples

Well, that's what it looked like with the first version I made. I call it the nipple version because well...I mean look at it. I usually like nipples, but not those ones. So I modified the code a bit, version 2 is what you see above. And here's an example of calling the function + with_nipples = false:

naive_filled_circle

Much better. The Bresenham one still looks the best IMO though. What about performance?

Here's my janky test code (using nanorand crate). I didn't look at the assembly code that was emitted, but I tried to stop the compiler from cheating:

fn test_circles(test_num: usize, r: isize) {
    debug_assert!(r > 0);
    let rf = r as f32;
    let mut points: Vec<(usize, usize)> = vec![];
    let mut rng = nanorand::tls_rng();
    for _ in 0..test_num {
        points.push((rng.generate_range(0..10_000), rng.generate_range(0..10_000)));
    }
    let mut circle: Vec<(usize, usize)>;
    let mut len: usize = 0;
    let start = Instant::now();
    for x in 0..test_num {
        circle = utils::get_full_circle(points[x].0, points[x].1, r, 11000);
        len += circle.len();
    }
    let duration = start.elapsed();
    println!("len = {}", len);
    println!(
        "{} iterations for utils::get_full_circle() took: {:?}",
        test_num, duration
    );

    len = 0;
    let start = Instant::now();
    for x in 0..test_num {
        circle = utils::get_full_circle_naive(points[x].0, points[x].1, rf, 11000, true, false);
        len += circle.len();
    }
    let duration = start.elapsed();
    println!("len = {}", len);
    println!(
        "{} iterations for utils::get_full_circle_naive() took: {:?}",
        test_num, duration
    );
}

I tested with a large number of smallish circles, and a smaller number of larger circles, results were not terribly different. In debug mode with opt-level = 0, the naive filled circle code is just over 4x slower! Ouch. I tried integer math that I ported from C for sqrt and distance calculations, and it was freakishly slower yet. In release mode with opt-level = 3, the bresenham filled circle code is 28% faster with MSVC toolchain, and it's about 10% faster with GNU toolchain. Interestingly, the integer code is slower with GNU, and the floating point code is faster with GNU, explaining the narrowed performance difference.

This brings me back to the first picture I posted above. My sharp angled, not very river-like line. This thing:

line_grid_scaled

I'm no riverologist, but I don't think rivers usually have those sharp angles. So now I need to smooth em out. I did a bit of searching and it looks like I want some cubic splines. I used the first crate I found, cubic_spline and after running those same points through it, here's my result:

spline_grid_scaled

Much better. Of course, after I finished up that rendering code, I found another crate called splines. I haven't tested it yet, but I will, because I'm curious about any differences. Next post I'll hopefully have some mountainy mountains and rivery rivers to show off.

In other news, rust-analyzer started doing this thing where it would reindex everything every new line of code I typed. Sucked a lot. I trashed everything Rust related on the machine and started out clean, now my coding sanity is restored. Well, as much as it could be anyhow.

Bresenham, splines, and nipples, oh my!