Skip to content

Add docstring examples for Aggregate statistical and regression functions#1417

Open
ntjohnson1 wants to merge 1 commit intoapache:mainfrom
rerun-io:nick/docstrings-agg-stat
Open

Add docstring examples for Aggregate statistical and regression functions#1417
ntjohnson1 wants to merge 1 commit intoapache:mainfrom
rerun-io:nick/docstrings-agg-stat

Conversation

@ntjohnson1
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Add example usage to docstrings for Aggregate statistical and regression functions to improve documentation.

What changes are included in this PR?

The first PR was basically adding a docstring to everything in functions. I broke it apart into a PR (that already merged) for the infra. I then reviewed and merged an example PR of adding the docstrings in parts. This is now the follow up opening a handful of PRs for the remaining functions in functions.py Everything is co-authored with Claude since I used claude to extend the handwritten examples I wrote for reference and to split apart the large PR rather than doing it manually.

I've reviewed all the code prior to PR.

Are there any user-facing changes?

No

…ions

Add example usage to docstrings for Aggregate statistical and regression functions to improve documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ntjohnson1
Thank you for your contribution.

Comment on lines +2125 to +2126
>>> builtins.round(
... result.collect_column("v")[0].as_py(), 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example can be simplified by choosing input values with an exact covariance result instead of importing builtins just to round the output.

That would make covar_pop read more like the surrounding examples.

Comment on lines +2166 to +2172
---------
>>> ctx = dfn.SessionContext()
>>> df = ctx.from_pydict({"a": [1.0, 2.0, 3.0], "b": [4.0, 5.0, 6.0]})
>>> result = df.aggregate(
... [], [dfn.functions.covar(dfn.col("a"), dfn.col("b")).alias("v")])
>>> result.collect_column("v")[0].as_py()
1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since covar() is an alias of covar_samp(), and the new example is a verbatim duplicate of the covar_samp function’s example, do you think keeping the example only on covar_samp() and leaving the alias docstring short would avoid doc drift?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants