Sunday, February 8, 2026

New York City traffic and the future of software development



 It is a truth universally acknowledged that code expands to fill the available capacity to debug it. We’ll get back to software in a minute, but let’s talk about a man first.

Robert Moses was a uniquely powerful, smart, and corrupt figure in mid-20th-century New York City and New York State. He was an unbelievably prolific builder who channeled various public and private resources into the parkways, beaches, and bridges of the New York metropolitan area by every means possible. He had no problem lying, ruining people’s lives, and upending communities if it served his purposes. Under his leadership, the city gained many more roads that were wider and better. It had more bridges connecting various parts of the city. It had more parkways.

Robert Moses and his subordinates believed that all this would alleviate the traffic problems the city was already experiencing with the mass production of cars. To their surprise, as they built more roads, more cars seemed to pop up. This offset the change in capacity and left congestion unchanged. They initially believed this to be a good sign. It meant that many people were suddenly able to use the roads, and they assumed the eventual result would be a smoother flow of traffic.

They were wrong.

More roads, they learned slowly, meant more, not less, traffic. This was counterintuitive at the time, but it is well understood today that this is how traffic in heavily populated areas works. It becomes congested up to a point, and adding roads provides a very temporary solution to the problem, if that.

Academics caught wind of this phenomenon and felt the immediate need to name it. Around 1962, economist Anthony Downs formulated "Downs's Law of Peak-Hour Traffic Congestion." By the '90s, the general case became known as induced demand.

For some things, it seemed, increasing the underlying resource’s availability is never enough.

Fast forward to emerging software development patterns in light of advances in AI. We appear to be more efficient and more productive. We need fewer people to do the same job, and we do it faster than ever. We generate code. More and more code.

The more code we generate, the more bugs there are in the world. The more bugs we find, the more work these code-generating machines will have. To fix them, they will generate more code, and round and round it goes. Therefore, in this software engineer’s opinion, reports of our profession’s death have been greatly exaggerated. The industry overall will end up needing similar numbers of people. The increase in efficiency we currently feel will soon be squashed under masses of code piling ever higher and deeper. There will always be just one last bug to solve, for which the AI apologizes humbly. "It was indeed an oversight on my part," it will say. "I will fix it immediately. Is there anything else you’d like to talk about?"


Sunday, July 27, 2025

The Stravinsky Principle in Software Engineering

(A frame from "Fantasia", set to Stravinsky's "The Rite of Spring", depicting two developers arguing about serialization formats, seconds before the AI meteor hits and wipes us all out.)

---

About 20 years ago, I attended a jazz masterclass by pianist Kenny Werner. That’s where I first heard Stravinsky’s famous line: 
“The more constraints one imposes, the more one frees oneself.”
Werner (and, I assume, Stravinsky) was talking about creativity—how limitations can be a catalyst for it. It’s a beautiful quote that resonates with my own experience. But recently, I’ve started thinking about it in a different way, not as something that fosters creativity, but rather as a hallmark of solid engineering.

---

The gist:
The more technical constraints I have on what I can do, the more freedom I have to not worry about breaking things, and the more confidently I can change code and behavior in the future.

---

Programming languages as built-in constraints


Languages are human inventions, and their design choices reflect certain beliefs about how code should be written.

In C, you must declare every variable’s type explicitly. That’s a constraint.

In C++, you can often use `auto` and let the compiler infer the type. In Go, you can skip some of that too, but you still need to compile before running—another constraint.

In Python? No declarations. No compilation. Just run it. If types don’t match, the interpreter will figure it out when (or if) it gets there. Amazing!

---

At first, that feels great—less ceremony, faster iteration, fewer keystrokes. But once you start working on anything bigger than a toy script, it bites back. Reading unfamiliar code becomes a chore—good luck figuring out what type that variable is supposed to be. And if you make a type-related mistake in an execution branch that isn’t covered by tests, you’ll only find out in production, because there’s no “annoying compiler” to catch these things upfront.

---

Yes, Python has type annotations, smart IDEs, and libraries like pydantic. But that sort of proves the point: as Python grew into a language for large systems, people added tools to enforce constraints—because constraints are useful. They give you freedom.

---

Same thing with serialization


Take something like Protocol BuffersYou define a schema, compile it into code, and make changes in controlled ways. Lots of work upfront!
Or you can just dump objects to JSON and move on—fast, simple, human-readable.

But then time passes. Someone adds a field, renames another, and suddenly our new code needs to read old data on disk… good luck figuring out what the old format was.

That up‑front “pain” of a strict schema gives you long‑term freedom: easier debugging, safer migrations, faster changes, and fewer surprises in production.

---

Stravinsky for software engineers


So here’s my takeaway:

The more constraints we embrace early on, the freer we are later.

Languages with stricter typing, serialization formats with schemas, databases with defined structure—these may feel like a burden while writing code.
But those constraints are what make future maintenance, debugging, and extension so much easier.

Friday, December 30, 2022

McHeron Algorithm for Square Root

Heron

Famously, Heron's method of computing the square root of a some number x goes like this:

1. Guess the root, we'll denote the guess by g

2. Compute the quotient q = x/g

3. Update your next guess to be the arithmetic mean of q and g. So g := (x/g + g)/2

4. Goto 2

This can be viewed as a special case of the Newton–Raphson method, and indeed the results become very accurate very fast.

Mckay

Somewhat less famous is the anecdote about Mckay's theorem.
In this paper by Laurence Sherzer, the author tells the story of an 8th grader (in 1973) named Robert Mckay, who suggests a way to find a number that lies between two fractions. Suppose the fractions are a/b and c/d, Mckay's suggestion is to use (a+c)/(b+d).
This is a mathematical education paper, so it's focus is about how the class and the teacher approached the seemingly weird idea of adding numerators and denominators, and how they explained to themselves why this seems to 'magically' work.
The short version of the explanation is: this amounts to taking a weighted average of the two fractions, where the weights are the denominators. When the denominators are equal, this is exactly the arithmetic mean. When they differ by a lot - the result is much closer to the fraction with the larger denominator.

McHeron?

Since taking the mean of two roots, in Heron's method, is annoying at times, why not use Mckay's very simple method instead, dubbing it the McHeron Algorithm?
Mathematically, it will give an increasingly better guess in each round, even though it may not be the arithmetic mean.
How would that look?
Suppose we want to compute the square root of 2:
1. Our first guess is 3/2
2. The quotient is 2/(3/2) which is 4/3 ('just invert and multiply' as they say)
3. Our next guess needs to be between 3/2 and 4/3, so we use Mckay's method to get 7/5
4. Our next guess is therefore 7/5
5. The next quotient is 2/(7/5) which is just 2*5/7 = 10/7
6. Using Mckay's theorem - our next guess will be (7+10)/(5+7) = 17/12
7. The next quotient is 2/(17/12) which is just 2*12/17 = 24/17
8. Using Mckay's - (17+24)/(12+17) = 41/29

And indeed 41/29 = 1.41379...
Which is approximates the square root of 2 up to 2 decimal points, which is not bad.

Cool, isn't it?

But of course, life is never quite so simple

I specifically chose a nice example where the algorithm converges relatively fast.
Had I chosen 715 (whose square root is 26.7394...) as the initial number, it would have taken me a 100 iterations of the algorithm just to get to 26.76... and I will be dealing with denominators with 145 digits by then.
Why would the algorithm perform so bad?
Well, remember how Mckay's method introduces a bias for large denominators? The algorithm above scales the quotient's denominator by the the input number, so the larger the input number, the greater the bias toward the quotient will be in the 'Mckay' step of the calculation.

Heron's algorithm has quadratic convergence (which is amazing), and I don't think showing exactly how bad the convergence of the McHeron algorithm is difficult (I suspect that asymptotically, to get decent results, you'll have to run it as many iterations as the magnitude of the result, so O(2^(n/2)) for an input with n bits, which is terrible), but I should go shopping for the weekend.

Silver Lining 

For small numbers (with small denominators) this works nicely for the same reasons, because Mckay's method gives a decent approximation of the arithmetic mean.

That's all, would love to hear thoughts in the comments, or links if you saw this exact method somewhere else, I haven't (and for a good reason, it yields terrible results in most cases).

Update: a Twitter Tweak

The user 
@k_yaakov
 suggested on Twitter to bring the denominators to roughly the same scale through multiplying by the appropriate power of 10. Since determining the correct power of 10 and multiplying by it can be done by counting the number of digits and adding zeros respectively, this is very easy to do in practice. The consequence is that the bias introduced by Mckay's weighted mean is now bounded by 1/10, rendering a significantly better convergence rate.
Taking the adversarial example of 715 from before: getting a 2 decimal points precision now requires only 17 iterations, compared to over 100 iterations in the previous version of McHeron. Very nice!

Monday, January 31, 2022

3 Golden Rules

"Just as I cannot step in the same river twice,  the person who created this bug and I are two separate beings in the fabric of the universe. Therefore, I will also not be the one who fixes it."

         Heraclitus of Ephesus, a programmer


I don't like code manifests.

Rina Artstain once tweeted that we only think we know certain things about parenting because as luck would have it, we haven't had a child to whom all of them don't apply.
That's how I feel about code manifests.

However, I do have a manifest of my own, short though it may be, so I want to frame it differently. What follows are not the three rules I think everyone should follow to get a "cleaner" code or some other metric of goodness. Rather, these are three things I like when I see in other people's code and annoy me when I see blatantly disregarded. 

Here we go.

1.

"לֹא עָלֶיךָ הַמְּלָאכָה לִגְמוֹר, וְלֹא אַתָּה בֶן חוֹרִין לִבָּטֵל מִמֶּנָּה."

פרקי אבות, ב' טז'

"It is not incumbent upon you to finish the task, but neither are you free to absolve yourself from it."

Pirkei Avot 2:16

This is one of the greatest quotes I know and it sounds so much better in Hebrew.


2.

"Entities should not be multiplied beyond necessity."

William of Ockham

Yes, this is the OG Ockham's razor. How amazing it is that a Franciscan friar who lived 700 years ago foresaw the harms of overusing OOP and polymorphism.


3.

"Writing good code is superior to writing bad code.
Deleting code is superior to writing good code.
Superior to all is concluding that some code does not have to be written at all."

Confucius

Well, not Confucius, but if one wants people to listen, one should attribute their thoughts to someone famous, preferably dead.

I'm pretty sure Oscar Wilde said that one. 




Tuesday, June 23, 2020

Math Shmath - My Podcast

During the first COVID-19 days, I had a couple of weeks hiatus between two jobs, which I used to create a popular-math podcast in Hebrew I call "Math Shmath".
There are only 5 episodes so far, but I hope to get some more in the future.
The intended audience is people with no formal math background, but through a listeners survey I found that about half of the responders had at least undergraduate level studies in math or sciences.

מתמטיקה שמתמטיקה

The podcast had over 5k downloads so far, and if you are a Hebrew speaker, I suggest you give it a try. It is available on iTunes, Spotify, and all the usual places, and I'm told kid with math tendencies at ages 10-15 like it.





Saturday, October 26, 2019

Zero Knowledge Proofs Via A Toy Example: An Overdue Video

I wrote a hands-on tutorial for Zero-Knowledge proofs more than a year ago on this blog (in four parts: I, II, III and IV).
It subsequently turned into a talk at the fifth Algorithms IL meetup, and it was videoed.
So here's the video, courtesy of the Algorithms IL team, and Tzipi Zanyovka from Waze, which hosted the event.

Thursday, October 4, 2018

A Hands-On Tutorial for Zero-Knowledge Proofs: Appendix

This is the final part of this hands-on tutorial. I will assume from now on that you have read Part I, Part II, and Part III of this series.

As promised, this post will deal with:

  1. Some tweaks to the protocol presented in the previous posts.
  2. A complexity analysis of the protocol,
  3. A small optimization.
  4. A few words about modern ZK proving protocols


Lior's Tweaks


A colleague at Starkware, Lior Goldberg, pointed out a caveat in the zero-knowledge aspect, and suggested a nice simplification of the protocol, here they are:

A ZK Fix


Suppose the first two numbers in the problem instance are 5 and 6, and that the random shift $r$ is chosen from the range $0..10$.
Now if the random query $i$ happens to be $1$, then the prover is required to reveal the second and third elements in the witness. If they happen to be 15 and 21, then the verifier knows immediately that 5 and 6 (from the problem instance) belong to the same side in the solution. This violates the zero knowledge property that we wanted.
This happened because we chose uniformly at random from a very small range, and $r$ happened to be the maximal number in that range.

There are two ways to solve this. One is by chosing some arbitrary number and doing all computations modulo that number. A simpler way would be chosing $r$ from a huge domain, such as $0..2^{100}$, which makes the probability of getting a revealing $r$ negligible.


Simplify By Having A Cyclic List


Our witness originally had $n + 1$ elements such that the first was a random number, and the rest were partial sums of the problem and the assignment dot product (plus the initial random number).
This meant we had two types of queries, one to check that two consecutive elements in the list differ in absolute value by the corresponding element in the problem list. Another type of query just checked that the first and last element are equal.

As Lior pointed out, it is much more elegant to omit the last element from the witness entirely, and if $i = n$ - check that the first and last elements in the witness differ, in absolute value, by the last element in the problem instance. Essentially, this is like thinking of the witness as cyclic. The nice thing about this is that now we only have one type of queries - a query about the difference between two consecutive elements modulo n in the witness.


Proof Size / Communication Complexity


We'd like to analyze the size of the proof that our code generates. This often referred to as communication complexity, because the Fiat-Shamir Heuristic (that was described in Part III) transforms messages (from an interactive protocol) to a proof, making these two terms interchangeable in this context.

So, for each query, the proof stores:
  • The value of i.
  • The value of the $i$-th element in the witness and the $(i \mod n)$-th element.
  • Authentication paths for both elements.
The authentication paths here are the heavy part. Each of them is a $\log(n)$-element long list of 256bit values.
As was discussed in the last post, to get a decent soundness, the number of queries has to be roughly $100n$. 
Putting these two together, the proof size will be dominated by the $~200 \cdot n \cdot \log(n)$ hashes that form the authentication paths.
So a proof that one knows an assignment to a Partition Problem instance with 1000 numbers, will require roughly $2,000,000$ hashes, which translate to 64 Megabytes of data.


Small Merkle Optimization


Since merkle authentication paths, somewhat surprisingly, make up the vast majority of the proof, maybe we can reduce their number by a little.

Note that all queries (except for one) ask about consecutive leaves in the tree. 
Consecutive leaves share, on average, have their LCA (least common ancestor) at height $\frac {\log n} {2}$. Up to the LCA, their authentication paths may differ, but from the LCA up to the root, they're authentication paths are identical, so we're wasting space writing both in the proof.
Omitting the path from the LCA to the root from one of them will bring the proof size down to $150 \cdot n \cdot \log (n)$, which is a nice 25% improvement.

Implementing this optimization, as well as Lior's tweaks, is left - as they say in textbooks - as an exercise for the reader.


Modern Protocols




Modern ZK proving protocols, such as ZK-SNARKS, ZK-STARKS, Bulletproof, Ligero, Aurora, and others, are often compared along these four axes:

  • What type of statements can be proved using the protocol.
  • How much space the proof takes up.
  • How long it takes to create a proof.
  • How long it takes to verify a proof.
Often the topic of trusted setup is discussed, but we won't get into that here.

Let's see how our toy-protocol fares:

Which statements can be proved?


In the toy-protocol, only knowledge of a solution to a Partition Problem instance could be proved. In contrast with most protocols, where one can use the protocol to prove knowledge of an input that satisfies some arbitrary arithmetic circuit, or even that a specific program ran for $T$ steps, and provided a specified output (this is what ZK-STARKS do).
Well, you may say, if you can prove one NP-complete problem (and the Partition Problem is one) - you can prove them all, due to polynomial time reductions. And theoretically speaking you would be right. However, in the practical world of ZK-proofs, all these manipulations have costs of their own, and conversions often incur a blow-up of the problem, since "polynomial reduction" is a theoretical term that can translate to non-practical cost. For this reasons, modern protocols make an effort to take as input more expressive forms (such as arithmetic circuits and statements about computer programs).

Space


As the analysis showed, our proof takes up $O(n \log (n))$ space, whereas in most modern protocols, the proof size is somewhere between constant and polylogarithmic in $n$ (e.g. $O(\log ^2 (n))$).
This huge gap is what this makes the proposed protocol nothing more than a toy example, that - while demonstrating certain approaches and tricks - is useless for any real application.
You can trace this gap to the fact we need a linear number of queries, each costing a logarithmic number of hashes (the Merkle authentication paths).
The approach I took was inspired by tricks from the ZK-STARK protocol, that is slightly more expensive than others in terms of proof size, but is expressive, requires relatively short prover time, and very short verifier time. In STARK, indeed the lion share of the proof is comprised of Merkle authentication paths, but great care is taken so that the number of queries will be minuscule. 

Prover Running Time


In our protocol it is roughly $O(n \log (n))$, which is not far from modern protocols.


Verifier Running Time


In our protocol it is linear in the proof size, so $O(n \log n)$ which is not so good. Recall that, at least in the context of blockchains, a proof is written once but verified many times (by miners for example). Modern protocols thus strive to make the verifier workload as small as they possibly can without impeding soundness.





This concludes what I hoped to cover in this tutorial. It was fun to write and code. Let's do it again sometime. :)